How do I install DOCX creation, editing, and analysis?

Install DOCX creation, editing, and analysis with a single command: npx mdskills install sickn33/docx-official. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support DOCX creation, editing, and analysis?

DOCX creation, editing, and analysis works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

DOCX creation, editing, and analysis

Name: DOCX creation, editing, and analysis: AI Agent Skill
Rating: 9 (1 reviews)
Author: sickn33

Verified

Productivity & TasksIntermediate

Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks

by @sickn339 downloads0Updated 2/20/2026

Add this skill

npx mdskills install sickn33/docx-official

Fork & Edit

Skill Advisor9.0

Comprehensive DOCX workflows with excellent redlining guidance and minimal-edit principles for professional document review

+Provides clear decision tree for choosing appropriate workflow based on document context
+Demonstrates sophisticated understanding of tracked changes with minimal-edit principle and batching strategy
+Includes thorough verification steps and batch organization guidance for complex edits
-References external files (ooxml.md, docx-js.md) that agents must read but aren't included in the skill

SKILL.md

Edit in Browser

1---
2name: docx
3description: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"
4license: Proprietary. LICENSE.txt has complete terms
5---
6 
7# DOCX creation, editing, and analysis
8 
9## Overview
10 
11A user may ask you to create, edit, or analyze the contents of a .docx file. A .docx file is essentially a ZIP archive containing XML files and other resources that you can read or edit. You have different tools and workflows available for different tasks.
12 
13## Workflow Decision Tree
14 
15### Reading/Analyzing Content
16Use "Text extraction" or "Raw XML access" sections below
17 
18### Creating New Document
19Use "Creating a new Word document" workflow
20 
21### Editing Existing Document
22- **Your own document + simple changes**
23  Use "Basic OOXML editing" workflow
24 
25- **Someone else's document**
26  Use **"Redlining workflow"** (recommended default)
27 
28- **Legal, academic, business, or government docs**
29  Use **"Redlining workflow"** (required)
30 
31## Reading and analyzing content
32 
33### Text extraction
34If you just need to read the text contents of a document, you should convert the document to markdown using pandoc. Pandoc provides excellent support for preserving document structure and can show tracked changes:
35 
36```bash
37# Convert document to markdown with tracked changes
38pandoc --track-changes=all path-to-file.docx -o output.md
39# Options: --track-changes=accept/reject/all
40```
41 
42### Raw XML access
43You need raw XML access for: comments, complex formatting, document structure, embedded media, and metadata. For any of these features, you'll need to unpack a document and read its raw XML contents.
44 
45#### Unpacking a file
46`python ooxml/scripts/unpack.py <office_file> <output_directory>`
47 
48#### Key file structures
49* `word/document.xml` - Main document contents
50* `word/comments.xml` - Comments referenced in document.xml
51* `word/media/` - Embedded images and media files
52* Tracked changes use `<w:ins>` (insertions) and `<w:del>` (deletions) tags
53 
54## Creating a new Word document
55 
56When creating a new Word document from scratch, use **docx-js**, which allows you to create Word documents using JavaScript/TypeScript.
57 
58### Workflow
591. **MANDATORY - READ ENTIRE FILE**: Read [`docx-js.md`](docx-js.md) (~500 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding with document creation.
602. Create a JavaScript/TypeScript file using Document, Paragraph, TextRun components (You can assume all dependencies are installed, but if not, refer to the dependencies section below)
613. Export as .docx using Packer.toBuffer()
62 
63## Editing an existing Word document
64 
65When editing an existing Word document, use the **Document library** (a Python library for OOXML manipulation). The library automatically handles infrastructure setup and provides methods for document manipulation. For complex scenarios, you can access the underlying DOM directly through the library.
66 
67### Workflow
681. **MANDATORY - READ ENTIRE FILE**: Read [`ooxml.md`](ooxml.md) (~600 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for the Document library API and XML patterns for directly editing document files.
692. Unpack the document: `python ooxml/scripts/unpack.py <office_file> <output_directory>`
703. Create and run a Python script using the Document library (see "Document Library" section in ooxml.md)
714. Pack the final document: `python ooxml/scripts/pack.py <input_directory> <office_file>`
72 
73The Document library provides both high-level methods for common operations and direct DOM access for complex scenarios.
74 
75## Redlining workflow for document review
76 
77This workflow allows you to plan comprehensive tracked changes using markdown before implementing them in OOXML. **CRITICAL**: For complete tracked changes, you must implement ALL changes systematically.
78 
79**Batching Strategy**: Group related changes into batches of 3-10 changes. This makes debugging manageable while maintaining efficiency. Test each batch before moving to the next.
80 
81**Principle: Minimal, Precise Edits**
82When implementing tracked changes, only mark text that actually changes. Repeating unchanged text makes edits harder to review and appears unprofessional. Break replacements into: [unchanged text] + [deletion] + [insertion] + [unchanged text]. Preserve the original run's RSID for unchanged text by extracting the `<w:r>` element from the original and reusing it.
83 
84Example - Changing "30 days" to "60 days" in a sentence:
85```python
86# BAD - Replaces entire sentence
87'<w:del><w:r><w:delText>The term is 30 days.</w:delText></w:r></w:del><w:ins><w:r><w:t>The term is 60 days.</w:t></w:r></w:ins>'
88 
89# GOOD - Only marks what changed, preserves original <w:r> for unchanged text
90'<w:r w:rsidR="00AB12CD"><w:t>The term is </w:t></w:r><w:del><w:r><w:delText>30</w:delText></w:r></w:del><w:ins><w:r><w:t>60</w:t></w:r></w:ins><w:r w:rsidR="00AB12CD"><w:t> days.</w:t></w:r>'
91```
92 
93### Tracked changes workflow
94 
951. **Get markdown representation**: Convert document to markdown with tracked changes preserved:
96   ```bash
97   pandoc --track-changes=all path-to-file.docx -o current.md
98   ```
99 
1002. **Identify and group changes**: Review the document and identify ALL changes needed, organizing them into logical batches:
101 
102   **Location methods** (for finding changes in XML):
103   - Section/heading numbers (e.g., "Section 3.2", "Article IV")
104   - Paragraph identifiers if numbered
105   - Grep patterns with unique surrounding text
106   - Document structure (e.g., "first paragraph", "signature block")
107   - **DO NOT use markdown line numbers** - they don't map to XML structure
108 
109   **Batch organization** (group 3-10 related changes per batch):
110   - By section: "Batch 1: Section 2 amendments", "Batch 2: Section 5 updates"
111   - By type: "Batch 1: Date corrections", "Batch 2: Party name changes"
112   - By complexity: Start with simple text replacements, then tackle complex structural changes
113   - Sequential: "Batch 1: Pages 1-3", "Batch 2: Pages 4-6"
114 
1153. **Read documentation and unpack**:
116   - **MANDATORY - READ ENTIRE FILE**: Read [`ooxml.md`](ooxml.md) (~600 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Pay special attention to the "Document Library" and "Tracked Change Patterns" sections.
117   - **Unpack the document**: `python ooxml/scripts/unpack.py <file.docx> <dir>`
118   - **Note the suggested RSID**: The unpack script will suggest an RSID to use for your tracked changes. Copy this RSID for use in step 4b.
119 
1204. **Implement changes in batches**: Group changes logically (by section, by type, or by proximity) and implement them together in a single script. This approach:
121   - Makes debugging easier (smaller batch = easier to isolate errors)
122   - Allows incremental progress
123   - Maintains efficiency (batch size of 3-10 changes works well)
124 
125   **Suggested batch groupings:**
126   - By document section (e.g., "Section 3 changes", "Definitions", "Termination clause")
127   - By change type (e.g., "Date changes", "Party name updates", "Legal term replacements")
128   - By proximity (e.g., "Changes on pages 1-3", "Changes in first half of document")
129 
130   For each batch of related changes:
131 
132   **a. Map text to XML**: Grep for text in `word/document.xml` to verify how text is split across `<w:r>` elements.
133 
134   **b. Create and run script**: Use `get_node` to find nodes, implement changes, then `doc.save()`. See **"Document Library"** section in ooxml.md for patterns.
135 
136   **Note**: Always grep `word/document.xml` immediately before writing a script to get current line numbers and verify text content. Line numbers change after each script run.
137 
1385. **Pack the document**: After all batches are complete, convert the unpacked directory back to .docx:
139   ```bash
140   python ooxml/scripts/pack.py unpacked reviewed-document.docx
141   ```
142 
1436. **Final verification**: Do a comprehensive check of the complete document:
144   - Convert final document to markdown:
145     ```bash
146     pandoc --track-changes=all reviewed-document.docx -o verification.md
147     ```
148   - Verify ALL changes were applied correctly:
149     ```bash
150     grep "original phrase" verification.md  # Should NOT find it
151     grep "replacement phrase" verification.md  # Should find it
152     ```
153   - Check that no unintended changes were introduced
154 
155 
156## Converting Documents to Images
157 
158To visually analyze Word documents, convert them to images using a two-step process:
159 
1601. **Convert DOCX to PDF**:
161   ```bash
162   soffice --headless --convert-to pdf document.docx
163   ```
164 
1652. **Convert PDF pages to JPEG images**:
166   ```bash
167   pdftoppm -jpeg -r 150 document.pdf page
168   ```
169   This creates files like `page-1.jpg`, `page-2.jpg`, etc.
170 
171Options:
172- `-r 150`: Sets resolution to 150 DPI (adjust for quality/size balance)
173- `-jpeg`: Output JPEG format (use `-png` for PNG if preferred)
174- `-f N`: First page to convert (e.g., `-f 2` starts from page 2)
175- `-l N`: Last page to convert (e.g., `-l 5` stops at page 5)
176- `page`: Prefix for output files
177 
178Example for specific range:
179```bash
180pdftoppm -jpeg -r 150 -f 2 -l 5 document.pdf page  # Converts only pages 2-5
181```
182 
183## Code Style Guidelines
184**IMPORTANT**: When generating code for DOCX operations:
185- Write concise code
186- Avoid verbose variable names and redundant operations
187- Avoid unnecessary print statements
188 
189## Dependencies
190 
191Required dependencies (install if not available):
192 
193- **pandoc**: `sudo apt-get install pandoc` (for text extraction)
194- **docx**: `npm install -g docx` (for creating new documents)
195- **LibreOffice**: `sudo apt-get install libreoffice` (for PDF conversion)
196- **Poppler**: `sudo apt-get install poppler-utils` (for pdftoppm to convert PDF to images)
197- **defusedxml**: `pip install defusedxml` (for secure XML parsing)

Full transparency — inspect the skill content before installing.