日本語版 README はこちら An MCP (Model Context Protocol) server that provides structured access to ISO 32000 (PDF) specification documents. Enables LLMs to navigate, search, and analyze PDF specifications through well-defined tools. - Multi-spec support — Auto-discovers and manages up to 17 PDF-related documents (ISO 32000-2, PDF/UA, Tagged PDF guides, etc.) - Structured content extraction — Headings, par
Add this skill
npx mdskills install shuji-bonji/pdf-spec-mcpComprehensive MCP server enabling structured access to ISO PDF specifications with 8 well-defined tools
1# PDF SPEC MCP Server23[](https://github.com/shuji-bonji/pdf-spec-mcp/actions/workflows/ci.yml)4[](https://www.npmjs.com/package/@shuji-bonji/pdf-spec-mcp)56[日本語版 README はこちら](README.ja.md)78An MCP (Model Context Protocol) server that provides structured access to ISO 32000 (PDF) specification documents. Enables LLMs to navigate, search, and analyze PDF specifications through well-defined tools.910> [!IMPORTANT]11> **PDF specification files are NOT included in this package.**12> You must obtain the PDF specification documents separately and place them in a local directory.13>14> **Download from:** [PDF Association — Sponsored Standards](https://pdfa.org/sponsored-standards/)15>16> See "[Setup](#setup)" for details.1718## Features1920- **Multi-spec support** — Auto-discovers and manages up to 17 PDF-related documents (ISO 32000-2, PDF/UA, Tagged PDF guides, etc.)21- **Structured content extraction** — Headings, paragraphs, lists, tables, and notes from any section22- **Full-text search** — Keyword search with section-aware context snippets23- **Requirements extraction** — Extracts normative language (shall / must / may) per ISO conventions24- **Definitions lookup** — Term definitions from Section 3 (Definitions)25- **Table extraction** — Multi-page table detection with header merging26- **Version comparison** — Diff PDF 1.7 vs PDF 2.0 section structures27- **Bounded-concurrency processing** — Parallel page processing for large documents2829## Architecture3031```mermaid32graph LR33 subgraph Client["MCP Client"]34 LLM["LLM<br/>(Claude, etc.)"]35 end3637 subgraph Server["PDF Spec MCP Server"]38 direction TB39 MCP["MCP Server<br/>index.ts"]4041 subgraph Tools["Tools Layer"]42 direction LR43 T1["list_specs"]44 T2["get_structure"]45 T3["get_section"]46 T4["search_spec"]47 T5["get_requirements"]48 T6["get_definitions"]49 T7["get_tables"]50 T8["compare_versions"]51 end5253 subgraph Services["Services Layer"]54 direction LR55 REG["Registry<br/>Auto-discovery"]56 LOADER["Loader<br/>LRU Cache"]57 SVC["PDFService<br/>Orchestration"]58 CMP["CompareService<br/>Version Diff"]59 end6061 subgraph Extractors["Extractors"]62 direction LR63 OUTLINE["OutlineResolver<br/>TOC & Section Index"]64 CONTENT["ContentExtractor<br/>Structured Extraction"]65 SEARCH["SearchIndex<br/>Full-text Search"]66 REQ["RequirementExtractor"]67 DEF["DefinitionExtractor"]68 end6970 subgraph Utils["Utils"]71 direction LR72 CACHE["LRU Cache"]73 CONC["Concurrency"]74 VALID["Validation"]75 end76 end7778 subgraph PDFs["PDF Spec Files (obtained separately)"]79 direction LR80 PDF1["ISO 32000-2<br/>(PDF 2.0)"]81 PDF2["ISO 32000-1<br/>(PDF 1.7)"]82 PDF3["TS 32001–32005<br/>PDF/UA, etc."]83 end8485 LLM <-->|"stdio / JSON-RPC"| MCP86 MCP --> Tools87 Tools --> Services88 Services --> Extractors89 Services --> Utils90 LOADER --> PDFs91 REG -->|"Filename pattern<br/>auto-discovery"| PDFs9293 style Client fill:#e8f4f8,stroke:#2196F394 style PDFs fill:#fff3e0,stroke:#FF980095 style Tools fill:#e8f5e9,stroke:#4CAF5096 style Services fill:#f3e5f5,stroke:#9C27B097 style Extractors fill:#fce4ec,stroke:#E91E6398 style Utils fill:#f5f5f5,stroke:#9E9E9E99```100101### Layer Overview102103| Layer | Responsibility |104| -------------- | ---------------------------------------------------------------------------------- |105| **Tools** | MCP tool schema definitions & handlers (input validation) |106| **Services** | Business logic (PDF registry, loader, orchestration) |107| **Extractors** | Information extraction from PDFs (TOC, content, search, requirements, definitions) |108| **Utils** | Shared utilities (cache, concurrency, validation) |109110## Setup111112### 1. Obtain PDF Specification Files113114> [!WARNING]115> PDF specifications are **copyrighted documents** and are not included in this package.116> Download them from the sources below and place them in a local directory.117118| Document | Source |119| ---------------------------- | ----------------------------------------------------------------------------------------------- |120| ISO 32000-2 (PDF 2.0) | [PDF Association](https://pdfa.org/resource/iso-32000-pdf/) |121| ISO 32000-1 (PDF 1.7) | [Adobe (free)](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf) |122| TS 32001–32005, PDF/UA, etc. | [PDF Association — Sponsored Standards](https://pdfa.org/sponsored-standards/) |123124All 17 files below are supported. You do not need all of them — place only the specs you need (at minimum, ISO 32000-2 is recommended).125126```127pdf-specs/128│129│ ── Standards ─────────────────────────────130├── ISO_32000-2_sponsored-ec2.pdf # iso32000-2 : PDF 2.0 EC2 (recommended)131├── ISO_32000-2-2020_sponsored.pdf # iso32000-2-2020 : PDF 2.0 original132├── PDF32000_2008.pdf # pdf17 : PDF 1.7 (for version comparison)133├── pdfreference1.7old.pdf # pdf17old : Adobe PDF Reference 1.7134│135│ ── Technical Specifications (TS) ─────────136├── ISO_TS_32001-2022_sponsored.pdf # ts32001 : Hash extensions (SHA-3)137├── ISO_TS_32002-2022_sponsored.pdf # ts32002 : Digital signature extensions (ECC/PAdES)138├── ISO_TS_32003-2023_sponsored.pdf # ts32003 : AES-GCM encryption139├── ISO-TS-32004-2024_sponsored.pdf # ts32004 : Integrity protection140├── ISO-TS-32005-2023-sponsored.pdf # ts32005 : Namespace mapping141│142│ ── PDF/UA (Accessibility) ────────────────143├── ISO-14289-1-2014-sponsored.pdf # pdfua1 : PDF/UA-1144├── ISO-14289-2-2024-sponsored.pdf # pdfua2 : PDF/UA-2145│146│ ── Guides ────────────────────────────────147├── Tagged-PDF-Best-Practice-Guide.pdf # tagged-bpg : Tagged PDF Best Practice148├── Well-Tagged-PDF-WTPDF-1.0.pdf # wtpdf : Well-Tagged PDF149├── PDF-Declarations.pdf # declarations: PDF Declarations150│151│ ── Application Notes ─────────────────────152├── PDF20_AN001-BPC.pdf # an001 : Black Point Compensation153├── PDF20_AN002-AF.pdf # an002 : Associated Files154└── PDF20_AN003-ObjectMetadataLocations.pdf # an003 : Object Metadata155```156157### 2. Install158159```bash160npm install @shuji-bonji/pdf-spec-mcp161```162163Or run directly with npx:164165```bash166PDF_SPEC_DIR=/path/to/pdf-specs npx @shuji-bonji/pdf-spec-mcp167```168169### 3. Configure MCP Client170171#### Environment Variable172173| Variable | Description | Default |174| -------------- | -------------------------------------------- | ---------- |175| `PDF_SPEC_DIR` | Directory containing PDF specification files | (required) |176177#### Claude Desktop178179Add to `claude_desktop_config.json`:180181```json182{183 "mcpServers": {184 "pdf-spec": {185 "command": "npx",186 "args": ["-y", "@shuji-bonji/pdf-spec-mcp"],187 "env": {188 "PDF_SPEC_DIR": "/path/to/pdf-specs"189 }190 }191 }192}193```194195#### Cursor / VS Code196197Add to `.cursor/mcp.json` or VS Code MCP settings:198199```json200{201 "mcpServers": {202 "pdf-spec": {203 "command": "npx",204 "args": ["-y", "@shuji-bonji/pdf-spec-mcp"],205 "env": {206 "PDF_SPEC_DIR": "/path/to/pdf-specs"207 }208 }209 }210}211```212213## Available Tools214215All tools accept an optional `spec` parameter to target a specific specification (default: `iso32000-2`).216217| Tool | Description |218| ------------------ | ----------------------------------------------------------------- |219| `list_specs` | List all discovered PDF specifications with metadata |220| `get_structure` | Get section hierarchy (table of contents) with configurable depth |221| `get_section` | Get structured content of a specific section |222| `search_spec` | Full-text keyword search across a specification |223| `get_requirements` | Extract normative requirements (shall/must/may) |224| `get_definitions` | Lookup term definitions |225| `get_tables` | Extract table structures from a section |226| `compare_versions` | Compare PDF 1.7 and PDF 2.0 section structures |227228### `list_specs` — Discover Specifications229230List all available specification documents. Use the returned IDs as the `spec` parameter in other tools.231232```jsonc233// List all specs234{ }235236// Filter by category237{ "category": "ts" } // Technical specs only238{ "category": "pdfua" } // PDF/UA only239{ "category": "guide" } // Guide documents only240```241242### `get_structure` — Table of Contents243244Get the section hierarchy (TOC tree) of a specification.245246```jsonc247// PDF 2.0 top-level sections only248{ "max_depth": 1 }249250// Expand to 2 levels251{ "max_depth": 2 }252253// TS 32002 (Digital Signatures) full structure254{ "spec": "ts32002" }255256// PDF/UA-2 structure257{ "spec": "pdfua2", "max_depth": 2 }258```259260### `get_section` — Section Content261262Get structured content (headings, paragraphs, lists, tables, notes) of a specific section.263264```jsonc265// PDF 2.0 Section 7.3.4 (String Objects)266{ "section": "7.3.4" }267268// PDF 2.0 Annex A269{ "section": "Annex A" }270271// TS 32002 Section 5272{ "spec": "ts32002", "section": "5" }273274// PDF/UA-2 Section 8 (Tagged PDF)275{ "spec": "pdfua2", "section": "8" }276```277278### `search_spec` — Full-text Search279280Search across a specification with section-aware context snippets.281282```jsonc283// Search PDF 2.0 for "digital signature"284{ "query": "digital signature" }285286// Limit results287{ "query": "font", "max_results": 5 }288289// Search within TS 32002290{ "spec": "ts32002", "query": "CMS" }291```292293### `get_requirements` — Normative Requirements294295Extract normative requirements (shall / must / may) per ISO conventions.296297```jsonc298// All requirements in section 12.8299{ "section": "12.8" }300301// Only "shall" requirements302{ "section": "12.8", "level": "shall" }303304// Only "shall not" requirements305{ "section": "7.3", "level": "shall not" }306307// PDF/UA-2 requirements308{ "spec": "pdfua2", "section": "8", "level": "shall" }309```310311### `get_definitions` — Term Definitions312313Look up term definitions from Section 3 (Definitions).314315```jsonc316// Search for "font" definitions317{ "term": "font" }318319// List all definitions320{ }321322// PDF/UA definitions323{ "spec": "pdfua2", "term": "artifact" }324```325326### `get_tables` — Table Extraction327328Extract table structures (headers, rows, captions) from a section. Multi-page tables are automatically merged.329330```jsonc331// All tables in section 7.3.4332{ "section": "7.3.4" }333334// Specific table only (0-based index)335{ "section": "7.3.4", "table_index": 0 }336337// TS spec tables338{ "spec": "ts32002", "section": "5" }339```340341### `compare_versions` — Version Comparison342343Compare section structures between PDF 1.7 (ISO 32000-1) and PDF 2.0 (ISO 32000-2). Uses title-based automatic matching to detect matched, added, and removed sections.344345> [!NOTE]346> This tool requires both PDF 1.7 (`PDF32000_2008.pdf`) and PDF 2.0 files in `PDF_SPEC_DIR`.347348```jsonc349// Diff section 12.8 (Digital Signatures)350{ "section": "12.8" }351352// Compare all top-level sections353{ }354```355356## Supported Specifications357358The server auto-discovers PDF files in `PDF_SPEC_DIR` by filename pattern matching:359360| Category | Spec IDs | Documents |361| ------------------ | ---------------------------------------------------- | ------------------------------------------------------- |362| **Standard** | `iso32000-2`, `iso32000-2-2020`, `pdf17`, `pdf17old` | ISO 32000-2 (PDF 2.0), ISO 32000-1 (PDF 1.7) |363| **Technical Spec** | `ts32001` – `ts32005` | Hash, Digital Signatures, AES-GCM, Integrity, Namespace |364| **PDF/UA** | `pdfua1`, `pdfua2` | Accessibility (ISO 14289-1, 14289-2) |365| **Guide** | `tagged-bpg`, `wtpdf`, `declarations` | Tagged PDF, Well-Tagged PDF, Declarations |366| **App Note** | `an001` – `an003` | BPC, Associated Files, Object Metadata |367368## Directory Structure369370```371src/372├── index.ts # MCP server entry point373├── config.ts # Configuration & spec patterns374├── errors.ts # Error hierarchy (PDFSpecError → sub-classes)375├── container.ts # Service container (DI wiring)376├── services/377│ ├── pdf-registry.ts # Auto-discovery of PDF files378│ ├── pdf-loader.ts # PDF loading with LRU cache379│ ├── pdf-service.ts # Orchestration layer380│ ├── compare-service.ts # Version comparison381│ ├── outline-resolver.ts # Section index builder382│ ├── content-extractor.ts # Structured content extraction383│ ├── search-index.ts # Full-text search index384│ ├── requirement-extractor.ts385│ └── definition-extractor.ts386├── tools/387│ ├── definitions.ts # MCP tool schemas388│ └── handlers.ts # Tool implementations389└── utils/390 ├── concurrency.ts # mapConcurrent (bounded Promise.all)391 ├── text.ts # Text normalization392 ├── cache.ts # LRU cache393 ├── validation.ts # Input validation394 └── logger.ts # Structured logger395```396397## Development398399```bash400git clone https://github.com/shuji-bonji/pdf-spec-mcp.git401cd pdf-spec-mcp402npm install403npm run build404405# Unit tests (237 tests)406npm run test407408# E2E tests (212 tests — requires PDF files in ./pdf-spec/)409npm run test:e2e410411# Lint & format412npm run lint413npm run format:check414```415416## License417418[MIT](LICENSE)419
Full transparency — inspect the skill content before installing.