Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of \"Word doc\", \"word document\", \".docx\", or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a \"report\", \"memo\", \"letter\", \"template\", or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.
Add this skill
npx mdskills install anthropics/docx-documentsComprehensive Word document manipulation with detailed XML editing patterns and validation workflows
1---2name: docx3description: "Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of \"Word doc\", \"word document\", \".docx\", or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a \"report\", \"memo\", \"letter\", \"template\", or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation."4license: Proprietary. LICENSE.txt has complete terms5---67# DOCX creation, editing, and analysis89## Overview1011A .docx file is a ZIP archive containing XML files.1213## Quick Reference1415| Task | Approach |16|------|----------|17| Read/analyze content | `pandoc` or unpack for raw XML |18| Create new document | Use `docx-js` - see Creating New Documents below |19| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |2021### Converting .doc to .docx2223Legacy `.doc` files must be converted before editing:2425```bash26python scripts/office/soffice.py --headless --convert-to docx document.doc27```2829### Reading Content3031```bash32# Text extraction with tracked changes33pandoc --track-changes=all document.docx -o output.md3435# Raw XML access36python scripts/office/unpack.py document.docx unpacked/37```3839### Converting to Images4041```bash42python scripts/office/soffice.py --headless --convert-to pdf document.docx43pdftoppm -jpeg -r 150 document.pdf page44```4546### Accepting Tracked Changes4748To produce a clean document with all tracked changes accepted (requires LibreOffice):4950```bash51python scripts/accept_changes.py input.docx output.docx52```5354---5556## Creating New Documents5758Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`5960### Setup61```javascript62const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,63 Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,64 TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,65 VerticalAlign, PageNumber, PageBreak } = require('docx');6667const doc = new Document({ sections: [{ children: [/* content */] }] });68Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));69```7071### Validation72After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.73```bash74python scripts/office/validate.py doc.docx75```7677### Page Size7879```javascript80// CRITICAL: docx-js defaults to A4, not US Letter81// Always set page size explicitly for consistent results82sections: [{83 properties: {84 page: {85 size: {86 width: 12240, // 8.5 inches in DXA87 height: 15840 // 11 inches in DXA88 },89 margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins90 }91 },92 children: [/* content */]93}]94```9596**Common page sizes (DXA units, 1440 DXA = 1 inch):**9798| Paper | Width | Height | Content Width (1" margins) |99|-------|-------|--------|---------------------------|100| US Letter | 12,240 | 15,840 | 9,360 |101| A4 (default) | 11,906 | 16,838 | 9,026 |102103**Landscape orientation:** docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:104```javascript105size: {106 width: 12240, // Pass SHORT edge as width107 height: 15840, // Pass LONG edge as height108 orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML109},110// Content width = 15840 - left margin - right margin (uses the long edge)111```112113### Styles (Override Built-in Headings)114115Use Arial as the default font (universally supported). Keep titles black for readability.116117```javascript118const doc = new Document({119 styles: {120 default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default121 paragraphStyles: [122 // IMPORTANT: Use exact IDs to override built-in styles123 { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,124 run: { size: 32, bold: true, font: "Arial" },125 paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC126 { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,127 run: { size: 28, bold: true, font: "Arial" },128 paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },129 ]130 },131 sections: [{132 children: [133 new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),134 ]135 }]136});137```138139### Lists (NEVER use unicode bullets)140141```javascript142// ❌ WRONG - never manually insert bullet characters143new Paragraph({ children: [new TextRun("• Item")] }) // BAD144new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD145146// ✅ CORRECT - use numbering config with LevelFormat.BULLET147const doc = new Document({148 numbering: {149 config: [150 { reference: "bullets",151 levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,152 style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },153 { reference: "numbers",154 levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,155 style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },156 ]157 },158 sections: [{159 children: [160 new Paragraph({ numbering: { reference: "bullets", level: 0 },161 children: [new TextRun("Bullet item")] }),162 new Paragraph({ numbering: { reference: "numbers", level: 0 },163 children: [new TextRun("Numbered item")] }),164 ]165 }]166});167168// ⚠️ Each reference creates INDEPENDENT numbering169// Same reference = continues (1,2,3 then 4,5,6)170// Different reference = restarts (1,2,3 then 1,2,3)171```172173### Tables174175**CRITICAL: Tables need dual widths** - set both `columnWidths` on the table AND `width` on each cell. Without both, tables render incorrectly on some platforms.176177```javascript178// CRITICAL: Always set table width for consistent rendering179// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds180const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };181const borders = { top: border, bottom: border, left: border, right: border };182183new Table({184 width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)185 columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)186 rows: [187 new TableRow({188 children: [189 new TableCell({190 borders,191 width: { size: 4680, type: WidthType.DXA }, // Also set on each cell192 shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID193 margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)194 children: [new Paragraph({ children: [new TextRun("Cell")] })]195 })196 ]197 })198 ]199})200```201202**Table width calculation:**203204Always use `WidthType.DXA` — `WidthType.PERCENTAGE` breaks in Google Docs.205206```javascript207// Table width = sum of columnWidths = content width208// US Letter with 1" margins: 12240 - 2880 = 9360 DXA209width: { size: 9360, type: WidthType.DXA },210columnWidths: [7000, 2360] // Must sum to table width211```212213**Width rules:**214- **Always use `WidthType.DXA`** — never `WidthType.PERCENTAGE` (incompatible with Google Docs)215- Table width must equal the sum of `columnWidths`216- Cell `width` must match corresponding `columnWidth`217- Cell `margins` are internal padding - they reduce content area, not add to cell width218- For full-width tables: use content width (page width minus left and right margins)219220### Images221222```javascript223// CRITICAL: type parameter is REQUIRED224new Paragraph({225 children: [new ImageRun({226 type: "png", // Required: png, jpg, jpeg, gif, bmp, svg227 data: fs.readFileSync("image.png"),228 transformation: { width: 200, height: 150 },229 altText: { title: "Title", description: "Desc", name: "Name" } // All three required230 })]231})232```233234### Page Breaks235236```javascript237// CRITICAL: PageBreak must be inside a Paragraph238new Paragraph({ children: [new PageBreak()] })239240// Or use pageBreakBefore241new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })242```243244### Table of Contents245246```javascript247// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles248new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })249```250251### Headers/Footers252253```javascript254sections: [{255 properties: {256 page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch257 },258 headers: {259 default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })260 },261 footers: {262 default: new Footer({ children: [new Paragraph({263 children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]264 })] })265 },266 children: [/* content */]267}]268```269270### Critical Rules for docx-js271272- **Set page size explicitly** - docx-js defaults to A4; use US Letter (12240 x 15840 DXA) for US documents273- **Landscape: pass portrait dimensions** - docx-js swaps width/height internally; pass short edge as `width`, long edge as `height`, and set `orientation: PageOrientation.LANDSCAPE`274- **Never use `\n`** - use separate Paragraph elements275- **Never use unicode bullets** - use `LevelFormat.BULLET` with numbering config276- **PageBreak must be in Paragraph** - standalone creates invalid XML277- **ImageRun requires `type`** - always specify png/jpg/etc278- **Always set table `width` with DXA** - never use `WidthType.PERCENTAGE` (breaks in Google Docs)279- **Tables need dual widths** - `columnWidths` array AND cell `width`, both must match280- **Table width = sum of columnWidths** - for DXA, ensure they add up exactly281- **Always add cell margins** - use `margins: { top: 80, bottom: 80, left: 120, right: 120 }` for readable padding282- **Use `ShadingType.CLEAR`** - never SOLID for table shading283- **TOC requires HeadingLevel only** - no custom styles on heading paragraphs284- **Override built-in styles** - use exact IDs: "Heading1", "Heading2", etc.285- **Include `outlineLevel`** - required for TOC (0 for H1, 1 for H2, etc.)286287---288289## Editing Existing Documents290291**Follow all 3 steps in order.**292293### Step 1: Unpack294```bash295python scripts/office/unpack.py document.docx unpacked/296```297Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`“` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.298299### Step 2: Edit XML300301Edit files in `unpacked/word/`. See XML Reference below for patterns.302303**Use "Claude" as the author** for tracked changes and comments, unless the user explicitly requests use of a different name.304305**Use the Edit tool directly for string replacement. Do not write Python scripts.** Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.306307**CRITICAL: Use smart quotes for new content.** When adding text with apostrophes or quotes, use XML entities to produce smart quotes:308```xml309<!-- Use these entities for professional typography -->310<w:t>Here’s a quote: “Hello”</w:t>311```312| Entity | Character |313|--------|-----------|314| `‘` | ‘ (left single) |315| `’` | ’ (right single / apostrophe) |316| `“` | “ (left double) |317| `”` | ” (right double) |318319**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):320```bash321python scripts/comment.py unpacked/ 0 "Comment text with & and ’"322python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0323python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name324```325Then add markers to document.xml (see Comments in XML Reference).326327### Step 3: Pack328```bash329python scripts/office/pack.py unpacked/ output.docx --original document.docx330```331Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.332333**Auto-repair will fix:**334- `durableId` >= 0x7FFFFFFF (regenerates valid ID)335- Missing `xml:space="preserve"` on `<w:t>` with whitespace336337**Auto-repair won't fix:**338- Malformed XML, invalid element nesting, missing relationships, schema violations339340### Common Pitfalls341342- **Replace entire `<w:r>` elements**: When adding tracked changes, replace the whole `<w:r>...</w:r>` block with `<w:del>...<w:ins>...` as siblings. Don't inject tracked change tags inside a run.343- **Preserve `<w:rPr>` formatting**: Copy the original run's `<w:rPr>` block into your tracked change runs to maintain bold, font size, etc.344345---346347## XML Reference348349### Schema Compliance350351- **Element order in `<w:pPr>`**: `<w:pStyle>`, `<w:numPr>`, `<w:spacing>`, `<w:ind>`, `<w:jc>`, `<w:rPr>` last352- **Whitespace**: Add `xml:space="preserve"` to `<w:t>` with leading/trailing spaces353- **RSIDs**: Must be 8-digit hex (e.g., `00AB1234`)354355### Tracked Changes356357**Insertion:**358```xml359<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">360 <w:r><w:t>inserted text</w:t></w:r>361</w:ins>362```363364**Deletion:**365```xml366<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">367 <w:r><w:delText>deleted text</w:delText></w:r>368</w:del>369```370371**Inside `<w:del>`**: Use `<w:delText>` instead of `<w:t>`, and `<w:delInstrText>` instead of `<w:instrText>`.372373**Minimal edits** - only mark what changes:374```xml375<!-- Change "30 days" to "60 days" -->376<w:r><w:t>The term is </w:t></w:r>377<w:del w:id="1" w:author="Claude" w:date="...">378 <w:r><w:delText>30</w:delText></w:r>379</w:del>380<w:ins w:id="2" w:author="Claude" w:date="...">381 <w:r><w:t>60</w:t></w:r>382</w:ins>383<w:r><w:t> days.</w:t></w:r>384```385386**Deleting entire paragraphs/list items** - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add `<w:del/>` inside `<w:pPr><w:rPr>`:387```xml388<w:p>389 <w:pPr>390 <w:numPr>...</w:numPr> <!-- list numbering if present -->391 <w:rPr>392 <w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>393 </w:rPr>394 </w:pPr>395 <w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">396 <w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>397 </w:del>398</w:p>399```400Without the `<w:del/>` in `<w:pPr><w:rPr>`, accepting changes leaves an empty paragraph/list item.401402**Rejecting another author's insertion** - nest deletion inside their insertion:403```xml404<w:ins w:author="Jane" w:id="5">405 <w:del w:author="Claude" w:id="10">406 <w:r><w:delText>their inserted text</w:delText></w:r>407 </w:del>408</w:ins>409```410411**Restoring another author's deletion** - add insertion after (don't modify their deletion):412```xml413<w:del w:author="Jane" w:id="5">414 <w:r><w:delText>deleted text</w:delText></w:r>415</w:del>416<w:ins w:author="Claude" w:id="10">417 <w:r><w:t>deleted text</w:t></w:r>418</w:ins>419```420421### Comments422423After running `comment.py` (see Step 2), add markers to document.xml. For replies, use `--parent` flag and nest markers inside the parent's.424425**CRITICAL: `<w:commentRangeStart>` and `<w:commentRangeEnd>` are siblings of `<w:r>`, never inside `<w:r>`.**426427```xml428<!-- Comment markers are direct children of w:p, never inside w:r -->429<w:commentRangeStart w:id="0"/>430<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">431 <w:r><w:delText>deleted</w:delText></w:r>432</w:del>433<w:r><w:t> more text</w:t></w:r>434<w:commentRangeEnd w:id="0"/>435<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>436437<!-- Comment 0 with reply 1 nested inside -->438<w:commentRangeStart w:id="0"/>439 <w:commentRangeStart w:id="1"/>440 <w:r><w:t>text</w:t></w:r>441 <w:commentRangeEnd w:id="1"/>442<w:commentRangeEnd w:id="0"/>443<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>444<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>445```446447### Images4484491. Add image file to `word/media/`4502. Add relationship to `word/_rels/document.xml.rels`:451```xml452<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>453```4543. Add content type to `[Content_Types].xml`:455```xml456<Default Extension="png" ContentType="image/png"/>457```4584. Reference in document.xml:459```xml460<w:drawing>461 <wp:inline>462 <wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->463 <a:graphic>464 <a:graphicData uri=".../picture">465 <pic:pic>466 <pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>467 </pic:pic>468 </a:graphicData>469 </a:graphic>470 </wp:inline>471</w:drawing>472```473474---475476## Dependencies477478- **pandoc**: Text extraction479- **docx**: `npm install -g docx` (new documents)480- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)481- **Poppler**: `pdftoppm` for images482
Full transparency — inspect the skill content before installing.