How do I install PDF Processing Guide?

Install PDF Processing Guide with a single command: npx mdskills install sickn33/pdf-official. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support PDF Processing Guide?

PDF Processing Guide works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

PDF Processing Guide

Name: PDF Processing Guide: AI Agent Skill
Brand: sickn33
Availability: InStock
Rating: 6 (1 reviews)
Author: sickn33

DocumentsIntermediate

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

by @sickn336 downloads13,166Updated 2/20/2026

Add this skill

npx mdskills install sickn33/pdf-official

Fork & Edit

Are you @sickn33? Sign in with GitHub to claim this listing.

Skill Advisor6.0

Comprehensive reference guide with solid code examples for common PDF operations

+Provides clear, working code examples across multiple libraries
+Covers broad range of PDF operations from basic to advanced
+Includes helpful Quick Reference table for tool selection
-Lacks trigger conditions or step-by-step agent workflow instructions
-Missing error handling patterns and validation guidance

SKILL.md

Edit in Browser

1---
2name: pdf
3description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
4license: Proprietary. LICENSE.txt has complete terms
5---
6 
7# PDF Processing Guide
8 
9## Overview
10 
11This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. If you need to fill out a PDF form, read forms.md and follow its instructions.
12 
13## Quick Start
14 
15```python
16from pypdf import PdfReader, PdfWriter
17 
18# Read a PDF
19reader = PdfReader("document.pdf")
20print(f"Pages: {len(reader.pages)}")
21 
22# Extract text
23text = ""
24for page in reader.pages:
25    text += page.extract_text()
26```
27 
28## Python Libraries
29 
30### pypdf - Basic Operations
31 
32#### Merge PDFs
33```python
34from pypdf import PdfWriter, PdfReader
35 
36writer = PdfWriter()
37for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
38    reader = PdfReader(pdf_file)
39    for page in reader.pages:
40        writer.add_page(page)
41 
42with open("merged.pdf", "wb") as output:
43    writer.write(output)
44```
45 
46#### Split PDF
47```python
48reader = PdfReader("input.pdf")
49for i, page in enumerate(reader.pages):
50    writer = PdfWriter()
51    writer.add_page(page)
52    with open(f"page_{i+1}.pdf", "wb") as output:
53        writer.write(output)
54```
55 
56#### Extract Metadata
57```python
58reader = PdfReader("document.pdf")
59meta = reader.metadata
60print(f"Title: {meta.title}")
61print(f"Author: {meta.author}")
62print(f"Subject: {meta.subject}")
63print(f"Creator: {meta.creator}")
64```
65 
66#### Rotate Pages
67```python
68reader = PdfReader("input.pdf")
69writer = PdfWriter()
70 
71page = reader.pages[0]
72page.rotate(90)  # Rotate 90 degrees clockwise
73writer.add_page(page)
74 
75with open("rotated.pdf", "wb") as output:
76    writer.write(output)
77```
78 
79### pdfplumber - Text and Table Extraction
80 
81#### Extract Text with Layout
82```python
83import pdfplumber
84 
85with pdfplumber.open("document.pdf") as pdf:
86    for page in pdf.pages:
87        text = page.extract_text()
88        print(text)
89```
90 
91#### Extract Tables
92```python
93with pdfplumber.open("document.pdf") as pdf:
94    for i, page in enumerate(pdf.pages):
95        tables = page.extract_tables()
96        for j, table in enumerate(tables):
97            print(f"Table {j+1} on page {i+1}:")
98            for row in table:
99                print(row)
100```
101 
102#### Advanced Table Extraction
103```python
104import pandas as pd
105 
106with pdfplumber.open("document.pdf") as pdf:
107    all_tables = []
108    for page in pdf.pages:
109        tables = page.extract_tables()
110        for table in tables:
111            if table:  # Check if table is not empty
112                df = pd.DataFrame(table[1:], columns=table[0])
113                all_tables.append(df)
114 
115# Combine all tables
116if all_tables:
117    combined_df = pd.concat(all_tables, ignore_index=True)
118    combined_df.to_excel("extracted_tables.xlsx", index=False)
119```
120 
121### reportlab - Create PDFs
122 
123#### Basic PDF Creation
124```python
125from reportlab.lib.pagesizes import letter
126from reportlab.pdfgen import canvas
127 
128c = canvas.Canvas("hello.pdf", pagesize=letter)
129width, height = letter
130 
131# Add text
132c.drawString(100, height - 100, "Hello World!")
133c.drawString(100, height - 120, "This is a PDF created with reportlab")
134 
135# Add a line
136c.line(100, height - 140, 400, height - 140)
137 
138# Save
139c.save()
140```
141 
142#### Create PDF with Multiple Pages
143```python
144from reportlab.lib.pagesizes import letter
145from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
146from reportlab.lib.styles import getSampleStyleSheet
147 
148doc = SimpleDocTemplate("report.pdf", pagesize=letter)
149styles = getSampleStyleSheet()
150story = []
151 
152# Add content
153title = Paragraph("Report Title", styles['Title'])
154story.append(title)
155story.append(Spacer(1, 12))
156 
157body = Paragraph("This is the body of the report. " * 20, styles['Normal'])
158story.append(body)
159story.append(PageBreak())
160 
161# Page 2
162story.append(Paragraph("Page 2", styles['Heading1']))
163story.append(Paragraph("Content for page 2", styles['Normal']))
164 
165# Build PDF
166doc.build(story)
167```
168 
169## Command-Line Tools
170 
171### pdftotext (poppler-utils)
172```bash
173# Extract text
174pdftotext input.pdf output.txt
175 
176# Extract text preserving layout
177pdftotext -layout input.pdf output.txt
178 
179# Extract specific pages
180pdftotext -f 1 -l 5 input.pdf output.txt  # Pages 1-5
181```
182 
183### qpdf
184```bash
185# Merge PDFs
186qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
187 
188# Split pages
189qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
190qpdf input.pdf --pages . 6-10 -- pages6-10.pdf
191 
192# Rotate pages
193qpdf input.pdf output.pdf --rotate=+90:1  # Rotate page 1 by 90 degrees
194 
195# Remove password
196qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
197```
198 
199### pdftk (if available)
200```bash
201# Merge
202pdftk file1.pdf file2.pdf cat output merged.pdf
203 
204# Split
205pdftk input.pdf burst
206 
207# Rotate
208pdftk input.pdf rotate 1east output rotated.pdf
209```
210 
211## Common Tasks
212 
213### Extract Text from Scanned PDFs
214```python
215# Requires: pip install pytesseract pdf2image
216import pytesseract
217from pdf2image import convert_from_path
218 
219# Convert PDF to images
220images = convert_from_path('scanned.pdf')
221 
222# OCR each page
223text = ""
224for i, image in enumerate(images):
225    text += f"Page {i+1}:\n"
226    text += pytesseract.image_to_string(image)
227    text += "\n\n"
228 
229print(text)
230```
231 
232### Add Watermark
233```python
234from pypdf import PdfReader, PdfWriter
235 
236# Create watermark (or load existing)
237watermark = PdfReader("watermark.pdf").pages[0]
238 
239# Apply to all pages
240reader = PdfReader("document.pdf")
241writer = PdfWriter()
242 
243for page in reader.pages:
244    page.merge_page(watermark)
245    writer.add_page(page)
246 
247with open("watermarked.pdf", "wb") as output:
248    writer.write(output)
249```
250 
251### Extract Images
252```bash
253# Using pdfimages (poppler-utils)
254pdfimages -j input.pdf output_prefix
255 
256# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.
257```
258 
259### Password Protection
260```python
261from pypdf import PdfReader, PdfWriter
262 
263reader = PdfReader("input.pdf")
264writer = PdfWriter()
265 
266for page in reader.pages:
267    writer.add_page(page)
268 
269# Add password
270writer.encrypt("userpassword", "ownerpassword")
271 
272with open("encrypted.pdf", "wb") as output:
273    writer.write(output)
274```
275 
276## Quick Reference
277 
278| Task | Best Tool | Command/Code |
279|------|-----------|--------------|
280| Merge PDFs | pypdf | `writer.add_page(page)` |
281| Split PDFs | pypdf | One page per file |
282| Extract text | pdfplumber | `page.extract_text()` |
283| Extract tables | pdfplumber | `page.extract_tables()` |
284| Create PDFs | reportlab | Canvas or Platypus |
285| Command line merge | qpdf | `qpdf --empty --pages ...` |
286| OCR scanned PDFs | pytesseract | Convert to image first |
287| Fill PDF forms | pdf-lib or pypdf (see forms.md) | See forms.md |
288 
289## Next Steps
290 
291- For advanced pypdfium2 usage, see reference.md
292- For JavaScript libraries (pdf-lib), see reference.md
293- If you need to fill out a PDF form, follow the instructions in forms.md
294- For troubleshooting guides, see reference.md
295

Full transparency — inspect the skill content before installing.

New to skill.md files?

See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.

Read the guide →