OpenZIM MCP Server Transform static ZIM archives into dynamic knowledge engines for AI models OpenZIM MCP transforms static ZIM archives into dynamic knowledge engines for Large Language Models. Unlike basic file readers, this tool provides intelligent, structured access that LLMs need to effectively navigate and understand vast knowledge repositories. Why LLMs Love OpenZIM MCP: - Smart Navigation
Add this skill
npx mdskills install cameronrye/openzim-mcpWell-documented MCP server with dual-mode access to offline knowledge archives
1<p align="center">2 <img src="https://raw.githubusercontent.com/cameronrye/openzim-mcp/main/website/assets/logo.svg" alt="OpenZIM MCP Logo" width="120" height="120">3</p>45<h1 align="center">OpenZIM MCP Server</h1>67<p align="center">8 <strong>Transform static ZIM archives into dynamic knowledge engines for AI models</strong>9</p>1011<p align="center">12 <a href="https://github.com/cameronrye/openzim-mcp/actions/workflows/test.yml"><img src="https://github.com/cameronrye/openzim-mcp/workflows/CI/badge.svg" alt="CI"></a>13 <a href="https://codecov.io/gh/cameronrye/openzim-mcp"><img src="https://codecov.io/gh/cameronrye/openzim-mcp/branch/main/graph/badge.svg" alt="codecov"></a>14 <a href="https://github.com/cameronrye/openzim-mcp/actions/workflows/codeql.yml"><img src="https://github.com/cameronrye/openzim-mcp/workflows/CodeQL%20Security%20Analysis/badge.svg" alt="CodeQL"></a>15 <a href="https://sonarcloud.io/summary/new_code?id=cameronrye_openzim-mcp"><img src="https://sonarcloud.io/api/project_badges/measure?project=cameronrye_openzim-mcp&metric=security_rating" alt="Security Rating"></a>16</p>1718<p align="center">19 <a href="https://badge.fury.io/py/openzim-mcp"><img src="https://badge.fury.io/py/openzim-mcp.svg" alt="PyPI version"></a>20 <a href="https://pypi.org/project/openzim-mcp/"><img src="https://img.shields.io/pypi/pyversions/openzim-mcp" alt="PyPI - Python Version"></a>21 <a href="https://pypi.org/project/openzim-mcp/"><img src="https://img.shields.io/pypi/dm/openzim-mcp" alt="PyPI - Downloads"></a>22 <a href="https://github.com/cameronrye/openzim-mcp/releases"><img src="https://img.shields.io/github/v/release/cameronrye/openzim-mcp" alt="GitHub release"></a>23</p>2425<p align="center">26 <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>27 <a href="https://pycqa.github.io/isort/"><img src="https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336" alt="Imports: isort"></a>28 <a href="https://mypy-lang.org/"><img src="https://img.shields.io/badge/type%20checked-mypy-blue" alt="Type checked: mypy"></a>29 <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>30</p>3132<p align="center">33 <a href="https://github.com/cameronrye/openzim-mcp/issues"><img src="https://img.shields.io/github/issues/cameronrye/openzim-mcp" alt="GitHub issues"></a>34 <a href="https://github.com/cameronrye/openzim-mcp/pulls"><img src="https://img.shields.io/github/issues-pr/cameronrye/openzim-mcp" alt="GitHub pull requests"></a>35 <a href="https://github.com/cameronrye/openzim-mcp/graphs/contributors"><img src="https://img.shields.io/github/contributors/cameronrye/openzim-mcp" alt="GitHub contributors"></a>36 <a href="https://github.com/cameronrye/openzim-mcp/stargazers"><img src="https://img.shields.io/github/stars/cameronrye/openzim-mcp?style=social" alt="GitHub stars"></a>37</p>3839---4041> ๐ **NEW: Article Summaries & Table of Contents!** Extract concise article summaries and hierarchical table of contents for quick content overview. Plus pagination cursors for seamless navigation! [Learn more โ](#get_entry_summary---get-a-concise-article-summary)4243> **Dual Mode Support:** Choose between Simple mode (1 intelligent natural language tool, default) or Advanced mode (18 specialized tools) to match your LLM's capabilities.4445## Built for LLM Intelligence4647**OpenZIM MCP transforms static ZIM archives into dynamic knowledge engines for Large Language Models.** Unlike basic file readers, this tool provides *intelligent, structured access* that LLMs need to effectively navigate and understand vast knowledge repositories.4849 **Why LLMs Love OpenZIM MCP:**5051- **Smart Navigation**: Browse by namespace (articles, metadata, media) instead of blind searching52- **Context-Aware Discovery**: Get article structure, relationships, and metadata for deeper understanding53- **Intelligent Search**: Advanced filtering, auto-complete suggestions, and relevance-ranked results54- **Performance Optimized**: Cached operations and pagination prevent timeouts on massive archives55- **Relationship Mapping**: Extract internal/external links to understand content connections5657Whether you're building a research assistant, knowledge chatbot, or content analysis system, OpenZIM MCP gives your LLM the structured access patterns it needs to unlock the full potential of offline knowledge archives. No more fumbling through raw text dumps!5859**OpenZIM MCP** is a modern, secure, and high-performance MCP (Model Context Protocol) server that enables AI models to access and search [ZIM format](https://en.wikipedia.org/wiki/ZIM_(file_format)) knowledge bases offline.6061[ZIM](https://en.wikipedia.org/wiki/ZIM_(file_format)) (Zeno IMproved) is an open file format developed by the [openZIM project](https://openzim.org/), designed specifically for offline storage and access to website content. The format supports high compression rates using Zstandard compression (default since 2021) and enables fast full-text searching, making it ideal for storing entire Wikipedia content and other large reference materials in relatively compact files. The openZIM project is sponsored by Wikimedia CH and supported by the Wikimedia Foundation, ensuring the format's continued development and adoption for offline knowledge access, especially in environments without reliable internet connectivity.6263## Features6465- **Dual Mode Support**: Choose between Simple mode (1 intelligent natural language tool, default) or Advanced mode (18 specialized tools)66- **Binary Content Retrieval**: ๐ Extract PDFs, images, videos, and other embedded media for multi-agent workflows67- **Security First**: Comprehensive input validation and path traversal protection68- **High Performance**: Intelligent caching and optimized ZIM file operations69- **Smart Retrieval**: Automatic fallback from direct access to search-based retrieval for reliable entry access70- **Well Tested**: 80%+ test coverage with comprehensive test suite71- **Modern Architecture**: Modular design with dependency injection72- **Type Safe**: Full type annotations throughout the codebase73- **Configurable**: Flexible configuration with validation74- **Observable**: Structured logging and health monitoring7576## Quick Start7778### Installation7980```bash81# Install from PyPI (recommended)82pip install openzim-mcp83```8485### Development Installation8687For contributors and developers:8889```bash90# Clone the repository91git clone https://github.com/cameronrye/openzim-mcp.git92cd openzim-mcp9394# Install dependencies95uv sync9697# Install development dependencies98uv sync --dev99```100101### Prepare ZIM Files102103Download ZIM files (e.g., Wikipedia, Wiktionary, etc.) from the [Kiwix Library](https://browse.library.kiwix.org/) and place them in a directory:104105```bash106mkdir ~/zim-files107# Download ZIM files to ~/zim-files/108```109110### Running the Server111112```bash113# Simple mode (default) - 1 intelligent natural language tool114openzim-mcp /path/to/zim/files115python -m openzim_mcp /path/to/zim/files116117# Advanced mode - all 18 specialized tools118openzim-mcp --mode advanced /path/to/zim/files119python -m openzim_mcp --mode advanced /path/to/zim/files120121# For development (from source)122uv run python -m openzim_mcp /path/to/zim/files123uv run python -m openzim_mcp --mode advanced /path/to/zim/files124125# Or using make (development)126make run ZIM_DIR=/path/to/zim/files127```128129### Tool Modes130131OpenZIM MCP supports two modes:132133- **Simple Mode** (default): Provides 1 intelligent tool (`zim_query`) that accepts natural language queries134- **Advanced Mode**: Exposes all 18 specialized MCP tools for maximum control135136See [Simple Mode Guide](docs/SIMPLE_MODE_GUIDE.md) for detailed information.137138### MCP Configuration139140**Simple Mode (default):**141142```json143{144 "openzim-mcp": {145 "command": "openzim-mcp",146 "args": ["/path/to/zim/files"]147 }148}149```150151**Advanced Mode:**152153```json154{155 "openzim-mcp-advanced": {156 "command": "openzim-mcp",157 "args": ["--mode", "advanced", "/path/to/zim/files"]158 }159}160```161162Alternative configuration using Python module:163164```json165{166 "openzim-mcp": {167 "command": "python",168 "args": [169 "-m",170 "openzim_mcp",171 "/path/to/zim/files"172 ]173 }174}175```176177For development (from source):178179```json180{181 "openzim-mcp": {182 "command": "uv",183 "args": [184 "--directory",185 "/path/to/openzim-mcp",186 "run",187 "python",188 "-m",189 "openzim_mcp",190 "/path/to/zim/files"191 ]192 }193}194```195196## Development197198### Running Tests199200```bash201# Run all tests202make test203204# Run tests with coverage205make test-cov206207# Run specific test file208uv run pytest tests/test_security.py -v209210# Run tests with ZIM test data (comprehensive testing)211make test-with-zim-data212213# Run integration tests only214make test-integration215216# Run tests that require ZIM test data217make test-requires-zim-data218```219220### ZIM Test Data Integration221222OpenZIM MCP integrates with the official [zim-testing-suite](https://github.com/openzim/zim-testing-suite) for comprehensive testing with real ZIM files:223224```bash225# Download essential test files (basic testing)226make download-test-data227228# Download all test files (comprehensive testing)229make download-test-data-all230231# List available test files232make list-test-data233234# Clean downloaded test data235make clean-test-data236```237238The test data includes:239240- **Basic files**: Small ZIM files for essential testing241- **Real content**: Actual Wikipedia/Wikibooks content for integration testing242- **Invalid files**: Malformed ZIM files for error handling testing243- **Special cases**: Embedded content, split files, and edge cases244245Test files are automatically organized by category and priority level.246247### Code Quality248249```bash250# Format code251make format252253# Run linting254make lint255256# Type checking257make type-check258259# Run all checks260make check261```262263### Project Structure264265```text266openzim-mcp/267โโโ openzim_mcp/ # Main package268โ โโโ __init__.py # Package initialization269โ โโโ __main__.py # Module entry point270โ โโโ main.py # Main entry point271โ โโโ server.py # MCP server implementation272โ โโโ config.py # Configuration management273โ โโโ security.py # Security and validation274โ โโโ cache.py # Caching functionality275โ โโโ content_processor.py # Content processing276โ โโโ zim_operations.py # ZIM file operations277โ โโโ exceptions.py # Custom exceptions278โ โโโ constants.py # Application constants279โโโ tests/ # Test suite280โโโ pyproject.toml # Project configuration281โโโ Makefile # Development commands282โโโ README.md # This file283```284285---286287## API Reference288289### Available Tools290291### list_zim_files - List all ZIM files in allowed directories292293No parameters required.294295### search_zim_file - Search within ZIM file content296297**Required parameters:**298299- `zim_file_path` (string): Path to the ZIM file300- `query` (string): Search query term301302**Optional parameters:**303304- `limit` (integer, default: 10): Maximum number of results to return305- `offset` (integer, default: 0): Starting offset for results (for pagination)306307### get_zim_entry - Get detailed content of a specific entry in a ZIM file308309**Required parameters:**310311- `zim_file_path` (string): Path to the ZIM file312- `entry_path` (string): Entry path, e.g., 'A/Some_Article'313314**Optional parameters:**315316- `max_content_length` (integer, default: 100000, minimum: 1000): Maximum length of returned content317318**Smart Retrieval Features:**319320- **Automatic Fallback**: If direct path access fails, automatically searches for the entry and uses the exact path found321- **Path Mapping Cache**: Caches successful path mappings for improved performance on repeated access322- **Enhanced Error Guidance**: Provides clear guidance when entries cannot be found, suggesting alternative approaches323- **Transparent Operation**: Works seamlessly regardless of path encoding differences (spaces vs underscores, URL encoding, etc.)324325### get_zim_metadata - Get ZIM file metadata from M namespace entries326327**Required parameters:**328329- `zim_file_path` (string): Path to the ZIM file330331**Returns:**332JSON string containing ZIM metadata including entry counts, archive information, and metadata entries like title, description, language, creator, etc.333334### get_main_page - Get the main page entry from W namespace335336**Required parameters:**337338- `zim_file_path` (string): Path to the ZIM file339340**Returns:**341Main page content or information about the main page entry.342343### list_namespaces - List available namespaces and their entry counts344345**Required parameters:**346347- `zim_file_path` (string): Path to the ZIM file348349**Returns:**350JSON string containing namespace information with entry counts, descriptions, and sample entries for each namespace (C, M, W, X, etc.).351352### browse_namespace - Browse entries in a specific namespace with pagination353354**Required parameters:**355356- `zim_file_path` (string): Path to the ZIM file357- `namespace` (string): Namespace to browse (C, M, W, X, A, I, etc.)358359**Optional parameters:**360361- `limit` (integer, default: 50, range: 1-200): Maximum number of entries to return362- `offset` (integer, default: 0): Starting offset for pagination363364**Returns:**365JSON string containing namespace entries with titles, content previews, and pagination information.366367### search_with_filters - Search within ZIM file content with advanced filters368369**Required parameters:**370371- `zim_file_path` (string): Path to the ZIM file372- `query` (string): Search query term373374**Optional parameters:**375376- `namespace` (string): Optional namespace filter (C, M, W, X, etc.)377- `content_type` (string): Optional content type filter (text/html, text/plain, etc.)378- `limit` (integer, default: 10, range: 1-100): Maximum number of results to return379- `offset` (integer, default: 0): Starting offset for pagination380381**Returns:**382Filtered search results with namespace and content type information.383384### get_search_suggestions - Get search suggestions and auto-complete385386**Required parameters:**387388- `zim_file_path` (string): Path to the ZIM file389- `partial_query` (string): Partial search query (minimum 2 characters)390391**Optional parameters:**392393- `limit` (integer, default: 10, range: 1-50): Maximum number of suggestions to return394395**Returns:**396JSON string containing search suggestions based on article titles and content.397398### get_article_structure - Extract article structure and metadata399400**Required parameters:**401402- `zim_file_path` (string): Path to the ZIM file403- `entry_path` (string): Entry path, e.g., 'C/Some_Article'404405**Returns:**406JSON string containing article structure including headings, sections, metadata, and word count.407408### extract_article_links - Extract internal and external links from an article409410**Required parameters:**411412- `zim_file_path` (string): Path to the ZIM file413- `entry_path` (string): Entry path, e.g., 'C/Some_Article'414415**Returns:**416JSON string containing categorized links (internal, external, media) with titles and metadata.417418### get_entry_summary - Get a concise article summary419420**Required parameters:**421422- `zim_file_path` (string): Path to the ZIM file423- `entry_path` (string): Entry path, e.g., 'C/Some_Article'424425**Optional parameters:**426427- `max_words` (integer, default: 200, range: 10-1000): Maximum number of words in the summary428429**Returns:**430JSON string containing a concise summary extracted from the article's opening paragraphs, with metadata including title, word count, and truncation status.431432**Features:**433434- Extracts opening paragraphs while removing infoboxes, navigation, and sidebars435- Provides quick article overview without loading full content436- Useful for LLMs to understand article context before deciding to read more437438### get_table_of_contents - Extract hierarchical table of contents439440**Required parameters:**441442- `zim_file_path` (string): Path to the ZIM file443- `entry_path` (string): Entry path, e.g., 'C/Some_Article'444445**Returns:**446JSON string containing a hierarchical tree structure of article headings (h1-h6), suitable for navigation and content overview.447448**Features:**449450- Hierarchical tree structure with nested children451- Includes heading levels, text, and anchor IDs452- Provides heading count and maximum depth statistics453- Enables LLMs to navigate directly to specific sections454455### get_binary_entry - Retrieve binary content from a ZIM entry456457**Required parameters:**458459- `zim_file_path` (string): Path to the ZIM file460- `entry_path` (string): Entry path, e.g., 'I/image.png' or 'I/document.pdf'461462**Optional parameters:**463464- `max_size_bytes` (integer): Maximum size of content to return (default: 10MB). Content larger than this will return metadata only.465- `include_data` (boolean): If true (default), include base64-encoded data. Set to false to retrieve metadata only.466467**Returns:**468469JSON string containing:470471- `path`: Entry path in ZIM file472- `title`: Entry title473- `mime_type`: Content type (e.g., "application/pdf", "image/png")474- `size`: Size in bytes475- `size_human`: Human-readable size (e.g., "1.5 MB")476- `encoding`: "base64" when data is included, null otherwise477- `data`: Base64-encoded content (if include_data=true and under size limit)478- `truncated`: Boolean indicating if content exceeded size limit479480**Use Cases:**481482- Retrieve PDFs for processing with PDF parsing tools483- Extract images for vision models or OCR tools484- Get video/audio files for transcription services485- Enable multi-agent workflows with specialized content processors486487---488489## Examples490491### Listing ZIM files492493```json494{495 "name": "list_zim_files"496}497```498499Response:500501```plain502Found 1 ZIM files in 1 directories:503504[505 {506 "name": "wikipedia_en_100_2025-08.zim",507 "path": "C:\\zim\\wikipedia_en_100_2025-08.zim",508 "directory": "C:\\zim",509 "size": "310.77 MB",510 "modified": "2025-09-11T10:20:50.148427"511 }512]513```514515### Searching ZIM files516517```json518{519 "name": "search_zim_file",520 "arguments": {521 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",522 "query": "biology",523 "limit": 3524 }525}526```527528Response:529530```plain531Found 51 matches for "biology", showing 1-3:532533## 1. Taxonomy (biology)534Path: Taxonomy_(biology)535Snippet: # Taxonomy (biology) Part of a series on536---537Evolutionary biology538Darwin's finches by John Gould539540 * Index541 * Introduction542 * [Main](Evolution "Evolution")543 * Outline544545## 2. Protein546Path: Protein547Snippet: # Protein A representation of the 3D structure of the protein myoglobin showing turquoise ฮฑ-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).548549## 3. Ant550Path: Ant551Snippet: # Ant Ants552Temporal range: Late Aptian โ Present553---554Fire ants555[Scientific classification](Taxonomy_\(biology\) "Taxonomy \(biology\)")556Kingdom: | [Animalia](Animal "Animal")557Phylum: | [Arthropoda](Arthropod "Arthropod")558Class: | [Insecta](Insect "Insect")559Order: | Hymenoptera560Infraorder: | Aculeata561Superfamily: |562Latreille, 1809[1]563Family: |564Latreille, 1809565```566567### Getting ZIM entries568569```json570{571 "name": "get_zim_entry",572 "arguments": {573 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",574 "entry_path": "Protein"575 }576}577```578579Response:580581```plain582# Protein583584Path: Protein585Type: text/html586## Content587588# Protein589590A representation of the 3D structure of the protein myoglobin showing turquoise ฮฑ-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).591592**Proteins** are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.593594A linear chain of amino acid residues is called a polypeptide. A protein contains at least one long polypeptide. Short polypeptides, containing less than 20โ30 residues, are rarely considered to be proteins and are commonly called peptides.595596... [Content truncated, total of 56,202 characters, only showing first 1,500 characters] ...597```598599### Smart Retrieval in Action600601**Example: Automatic path resolution**602603```json604{605 "name": "get_zim_entry",606 "arguments": {607 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",608 "entry_path": "A/Test Article"609 }610}611```612613Response (showing smart retrieval working):614615```plain616# Test Article617618Requested Path: A/Test Article619Actual Path: A/Test_Article620Type: text/html621622## Content623624# Test Article625626This article demonstrates the smart retrieval system automatically handling627path encoding differences. The system tried "A/Test Article" directly,628then automatically searched and found "A/Test_Article".629630... [Content continues] ...631```632633### get_server_health - Get server health and statistics634635No parameters required.636637**Returns:**638639- Server status and performance metrics640- Cache statistics641- Configuration information642- Instance tracking information643- Conflict detection results644645**Example Response:**646647```json648{649 "status": "healthy",650 "server_name": "openzim-mcp",651 "allowed_directories": 1,652 "cache": {653 "enabled": true,654 "size": 1,655 "max_size": 100,656 "ttl_seconds": 3600657 },658 "instance_tracking": {659 "active_instances": 1,660 "conflicts_detected": 0661 }662}663```664665### get_server_configuration - Get detailed server configuration666667No parameters required.668669**Returns:**670Comprehensive server configuration including diagnostics, validation results, and conflict detection.671672**Example Response:**673674```json675{676 "configuration": {677 "server_name": "openzim-mcp",678 "allowed_directories": ["/path/to/zim/files"],679 "cache_enabled": true,680 "config_hash": "abc123...",681 "server_pid": 12345682 },683 "diagnostics": {684 "validation_status": "healthy",685 "conflicts_detected": [],686 "warnings": [],687 "recommendations": []688 }689}690```691692### diagnose_server_state - Comprehensive server diagnostics693694No parameters required.695696**Returns:**697Detailed diagnostic information including instance conflicts, configuration validation, file accessibility checks, and actionable recommendations.698699**Example Response:**700701```json702{703 "status": "healthy",704 "server_info": {705 "pid": 12345,706 "server_name": "openzim-mcp",707 "config_hash": "abc123..."708 },709 "conflicts": [],710 "issues": [],711 "recommendations": ["Server appears to be running normally"],712 "environment_checks": {713 "directories_accessible": true,714 "cache_functional": true715 }716}717```718719### resolve_server_conflicts - Identify and resolve server conflicts720721No parameters required.722723**Returns:**724Results of conflict resolution including cleanup actions and recommendations.725726**Example Response:**727728```json729{730 "status": "success",731 "cleanup_results": {732 "stale_instances_removed": 2733 },734 "conflicts_found": [],735 "actions_taken": ["Removed 2 stale instance files"],736 "recommendations": ["No active conflicts detected"]737}738```739740### Additional Search Examples741742**Computer-related search:**743744```json745{746 "name": "search_zim_file",747 "arguments": {748 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",749 "query": "computer",750 "limit": 2751 }752}753```754755Response:756757```plain758Found 39 matches for "computer", showing 1-2:759760## 1. Video game761Path: Video_game762Snippet: # Video game First-generation _Pong_ console at the Computerspielemuseum Berlin763---764Platforms765766## 2. Protein767Path: Protein768Snippet: # Protein A representation of the 3D structure of the protein myoglobin showing turquoise ฮฑ-helices. This protein was the first to have its structure solved by X-ray crystallography. Toward the right-center among the coils, a prosthetic group called a heme group (shown in gray) with a bound oxygen molecule (red).769```770771**Getting detailed content:**772773```json774{775 "name": "get_zim_entry",776 "arguments": {777 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",778 "entry_path": "Evolution",779 "max_content_length": 1500780 }781}782```783784Response:785786```plain787# Evolution788789Path: Evolution790Type: text/html791## Content792793# Evolution794795Part of the Biology series on796---797****798Mechanisms and processes799800 * Adaptation801 * Genetic drift802 * Gene flow803 * History of life804 * Maladaptation805 * Mutation806 * Natural selection807 * Neutral theory808 * Population genetics809 * Speciation810811... [Content truncated, total of 110,237 characters, only showing first 1,500 characters] ...812```813814### Advanced Knowledge Retrieval Examples815816**Getting ZIM metadata:**817818```json819{820 "name": "get_zim_metadata",821 "arguments": {822 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim"823 }824}825```826827Response:828829```json830{831 "entry_count": 100000,832 "all_entry_count": 120000,833 "article_count": 80000,834 "media_count": 20000,835 "metadata_entries": {836 "Title": "Wikipedia (English)",837 "Description": "Wikipedia articles in English",838 "Language": "eng",839 "Creator": "Kiwix",840 "Date": "2025-08-15"841 }842}843```844845**Browsing a namespace:**846847```json848{849 "name": "browse_namespace",850 "arguments": {851 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",852 "namespace": "C",853 "limit": 5,854 "offset": 0855 }856}857```858859Response:860861```json862{863 "namespace": "C",864 "total_in_namespace": 80000,865 "offset": 0,866 "limit": 5,867 "returned_count": 5,868 "has_more": true,869 "entries": [870 {871 "path": "C/Biology",872 "title": "Biology",873 "content_type": "text/html",874 "preview": "Biology is the scientific study of life..."875 }876 ]877}878```879880**Filtered search:**881882```json883{884 "name": "search_with_filters",885 "arguments": {886 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",887 "query": "evolution",888 "namespace": "C",889 "content_type": "text/html",890 "limit": 3891 }892}893```894895**Getting article structure:**896897```json898{899 "name": "get_article_structure",900 "arguments": {901 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",902 "entry_path": "C/Evolution"903 }904}905```906907Response:908909```json910{911 "title": "Evolution",912 "path": "C/Evolution",913 "content_type": "text/html",914 "headings": [915 {"level": 1, "text": "Evolution", "id": "evolution"},916 {"level": 2, "text": "History", "id": "history"},917 {"level": 2, "text": "Mechanisms", "id": "mechanisms"}918 ],919 "sections": [920 {921 "title": "Evolution",922 "level": 1,923 "content_preview": "Evolution is the change in heritable traits...",924 "word_count": 150925 }926 ],927 "word_count": 5000928}929```930931**Getting article summary:**932933```json934{935 "name": "get_entry_summary",936 "arguments": {937 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",938 "entry_path": "C/Evolution",939 "max_words": 100940 }941}942```943944Response:945946```json947{948 "title": "Evolution",949 "path": "C/Evolution",950 "content_type": "text/html",951 "summary": "Evolution is the change in heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed from parent to offspring during reproduction...",952 "word_count": 100,953 "is_truncated": true954}955```956957**Getting table of contents:**958959```json960{961 "name": "get_table_of_contents",962 "arguments": {963 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",964 "entry_path": "C/Evolution"965 }966}967```968969Response:970971```json972{973 "title": "Evolution",974 "path": "C/Evolution",975 "content_type": "text/html",976 "toc": [977 {978 "level": 1,979 "text": "Evolution",980 "id": "evolution",981 "children": [982 {983 "level": 2,984 "text": "History of evolutionary thought",985 "id": "history",986 "children": []987 },988 {989 "level": 2,990 "text": "Mechanisms",991 "id": "mechanisms",992 "children": []993 }994 ]995 }996 ],997 "heading_count": 15,998 "max_depth": 4999}1000```10011002**Getting search suggestions:**10031004```json1005{1006 "name": "get_search_suggestions",1007 "arguments": {1008 "zim_file_path": "C:\\zim\\wikipedia_en_100_2025-08.zim",1009 "partial_query": "bio",1010 "limit": 51011 }1012}1013```10141015Response:10161017```json1018{1019 "partial_query": "bio",1020 "suggestions": [1021 {"text": "Biology", "path": "C/Biology", "type": "title_start_match"},1022 {"text": "Biochemistry", "path": "C/Biochemistry", "type": "title_start_match"},1023 {"text": "Biodiversity", "path": "C/Biodiversity", "type": "title_start_match"}1024 ],1025 "count": 31026}1027```10281029### Server Management and Diagnostics Examples10301031**Getting server health:**10321033```json1034{1035 "name": "get_server_health"1036}1037```10381039Response:10401041```json1042{1043 "status": "healthy",1044 "server_name": "openzim-mcp",1045 "uptime_info": {1046 "process_id": 12345,1047 "started_at": "2025-09-14T10:30:00"1048 },1049 "cache_performance": {1050 "enabled": true,1051 "size": 15,1052 "max_size": 100,1053 "hit_rate": 0.851054 },1055 "instance_tracking": {1056 "active_instances": 1,1057 "conflicts_detected": 01058 }1059}1060```10611062**Diagnosing server state:**10631064```json1065{1066 "name": "diagnose_server_state"1067}1068```10691070Response:10711072```json1073{1074 "status": "healthy",1075 "server_info": {1076 "pid": 12345,1077 "server_name": "openzim-mcp",1078 "config_hash": "abc123def456..."1079 },1080 "conflicts": [],1081 "issues": [],1082 "recommendations": ["Server appears to be running normally. No issues detected."],1083 "environment_checks": {1084 "directories_accessible": true,1085 "cache_functional": true,1086 "zim_files_found": 51087 }1088}1089```10901091**Resolving server conflicts:**10921093```json1094{1095 "name": "resolve_server_conflicts"1096}1097```10981099Response:11001101```json1102{1103 "status": "success",1104 "cleanup_results": {1105 "stale_instances_removed": 2,1106 "files_cleaned": ["/home/user/.openzim_mcp_instances/server_99999.json"]1107 },1108 "conflicts_found": [],1109 "actions_taken": ["Removed 2 stale instance files"],1110 "recommendations": ["No active conflicts detected after cleanup"]1111}1112```11131114---11151116## ZIM Entry Retrieval Best Practices11171118### Smart Retrieval System11191120OpenZIM MCP implements an intelligent entry retrieval system that automatically handles path encoding inconsistencies common in ZIM files:11211122**How It Works:**112311241. **Direct Access First**: Attempts to retrieve the entry using the provided path exactly as given11252. **Automatic Fallback**: If direct access fails, automatically searches for the entry using various search terms11263. **Path Mapping Cache**: Caches successful path mappings to improve performance for repeated access11274. **Enhanced Error Guidance**: Provides clear guidance when entries cannot be found11281129**Benefits for LLM Users:**11301131- **Transparent Operation**: No need to understand ZIM path encoding complexities1132- **Single Tool Call**: Eliminates the need for manual search-first methodology1133- **Reliable Results**: Consistent success across different path formats (spaces vs underscores, URL encoding, etc.)1134- **Performance Optimized**: Cached mappings improve repeated access speed11351136**Example Scenarios Handled Automatically:**11371138- `A/Test Article` โ `A/Test_Article` (space to underscore conversion)1139- `C/Cafรฉ` โ `C/Caf%C3%A9` (URL encoding differences)1140- `A/Some-Page` โ `A/Some_Page` (hyphen to underscore conversion)11411142### Usage Recommendations11431144**For Direct Entry Access:**11451146```json1147{1148 "name": "get_zim_entry",1149 "arguments": {1150 "zim_file_path": "/path/to/file.zim",1151 "entry_path": "A/Article_Name"1152 }1153}1154```11551156**When Entry Not Found:**1157The system will automatically provide guidance:11581159```1160Entry not found: 'A/Article_Name'.1161The entry path may not exist in this ZIM file.1162Try using search_zim_file() to find available entries,1163or browse_namespace() to explore the file structure.1164```11651166---11671168## Important Notes and Limitations11691170### Content Length Requirements11711172- The `max_content_length` parameter for `get_zim_entry` must be at least 1000 characters1173- Content longer than the specified limit will be truncated with a note showing the total character count11741175### Search Behavior11761177- Search results may include articles that contain the search terms in various contexts1178- Results are ranked by relevance but may not always be directly related to the primary meaning of the search term1179- Search snippets provide a preview of the content but may not show the exact location where the search term appears11801181### File Format Support11821183- Currently supports ZIM files (Zeno IMproved format)1184- Tested with Wikipedia ZIM files (e.g., `wikipedia_en_100_2025-08.zim`)1185- File paths must be properly escaped in JSON (use `\\` for Windows paths)11861187---11881189## Multi-Server Instance Management11901191OpenZIM MCP includes advanced multi-server instance tracking and conflict detection to ensure reliable operation when multiple server instances are running.11921193### Instance Tracking Features11941195- **Automatic Instance Registration**: Each server instance is automatically registered with a unique process ID and configuration hash1196- **Conflict Detection**: Detects when multiple servers with different configurations are accessing the same directories1197- **Stale Instance Cleanup**: Automatically identifies and cleans up orphaned instance files from terminated processes1198- **Configuration Validation**: Ensures all server instances use compatible configurations11991200### Conflict Types120112021. **Configuration Mismatch**: Multiple servers with different settings accessing the same directories12032. **Multiple Instances**: Multiple servers running simultaneously (may cause confusion)12043. **Stale Instances**: Orphaned instance files from terminated processes12051206### Automatic Conflict Warnings12071208OpenZIM MCP automatically includes conflict warnings in search results and file listings when issues are detected:12091210```plain1211 **Server Conflict Detected**1212 Configuration mismatch with server PID 12345. Search results may be inconsistent.1213 Use 'resolve_server_conflicts()' to fix these issues.1214```12151216### Best Practices12171218- Use `diagnose_server_state()` regularly to check for conflicts1219- Run `resolve_server_conflicts()` to clean up stale instances1220- Ensure all server instances use the same configuration when accessing shared directories1221- Monitor server health with `get_server_health()` for instance tracking information12221223---12241225## Configuration12261227OpenZIM MCP supports configuration through environment variables with the `OPENZIM_MCP_` prefix:12281229```bash1230# Cache configuration1231export OPENZIM_MCP_CACHE__ENABLED=true1232export OPENZIM_MCP_CACHE__MAX_SIZE=2001233export OPENZIM_MCP_CACHE__TTL_SECONDS=720012341235# Content configuration1236export OPENZIM_MCP_CONTENT__MAX_CONTENT_LENGTH=2000001237export OPENZIM_MCP_CONTENT__SNIPPET_LENGTH=20001238export OPENZIM_MCP_CONTENT__DEFAULT_SEARCH_LIMIT=2012391240# Logging configuration1241export OPENZIM_MCP_LOGGING__LEVEL=DEBUG1242export OPENZIM_MCP_LOGGING__FORMAT="%(asctime)s - %(name)s - %(levelname)s - %(message)s"12431244# Server configuration1245export OPENZIM_MCP_SERVER_NAME=my_openzim_mcp_server1246```12471248### Configuration Options12491250| Setting | Default | Description |1251|---------|---------|-------------|1252| `OPENZIM_MCP_CACHE__ENABLED` | `true` | Enable/disable caching |1253| `OPENZIM_MCP_CACHE__MAX_SIZE` | `100` | Maximum cache entries |1254| `OPENZIM_MCP_CACHE__TTL_SECONDS` | `3600` | Cache TTL in seconds |1255| `OPENZIM_MCP_CONTENT__MAX_CONTENT_LENGTH` | `100000` | Max content length |1256| `OPENZIM_MCP_CONTENT__SNIPPET_LENGTH` | `1000` | Max snippet length |1257| `OPENZIM_MCP_CONTENT__DEFAULT_SEARCH_LIMIT` | `10` | Default search result limit |1258| `OPENZIM_MCP_LOGGING__LEVEL` | `INFO` | Logging level |1259| `OPENZIM_MCP_LOGGING__FORMAT` | `%(asctime)s - %(name)s - %(levelname)s - %(message)s` | Log message format |1260| `OPENZIM_MCP_SERVER_NAME` | `openzim-mcp` | Server instance name |12611262---12631264## Security Features12651266- **Path Traversal Protection**: Secure path validation prevents access outside allowed directories1267- **Input Sanitization**: All user inputs are validated and sanitized1268- **Resource Management**: Proper cleanup of ZIM archive resources1269- **Error Handling**: Sanitized error messages prevent information disclosure1270- **Type Safety**: Full type annotations prevent type-related vulnerabilities12711272---12731274## Performance Features12751276- **Intelligent Caching**: LRU cache with TTL for frequently accessed content1277- **Resource Pooling**: Efficient ZIM archive management1278- **Optimized Content Processing**: Fast HTML to text conversion1279- **Lazy Loading**: Components initialized only when needed1280- **Memory Management**: Proper cleanup and resource management12811282---12831284## Testing12851286The project includes comprehensive testing with 80%+ coverage using both mock data and real ZIM files:12871288### Test Categories12891290- **Unit Tests**: Individual component testing with mocks1291- **Integration Tests**: End-to-end functionality testing with real ZIM files1292- **Security Tests**: Path traversal and input validation testing1293- **Performance Tests**: Cache and resource management testing1294- **Format Compatibility**: Testing with various ZIM file formats and versions1295- **Error Handling**: Testing with invalid and malformed ZIM files12961297### Test Infrastructure12981299OpenZIM MCP uses a hybrid testing approach:130013011. **Mock-based tests**: Fast unit tests using mocked libzim components13022. **Real ZIM file tests**: Integration tests using official zim-testing-suite files13033. **Automatic test data management**: Download and organize test files as needed13041305### Test Data Sources13061307- **Built-in test data**: Basic test files included in the repository1308- **zim-testing-suite integration**: Official test files from the OpenZIM project1309- **Environment variable support**: `ZIM_TEST_DATA_DIR` for custom test data locations13101311```bash1312# Run tests with coverage report1313make test-cov13141315# View coverage report1316open htmlcov/index.html13171318# Run comprehensive tests with real ZIM files1319make test-with-zim-data1320```13211322### Test Markers13231324Tests are organized with pytest markers:13251326- `@pytest.mark.requires_zim_data`: Tests requiring ZIM test data files1327- `@pytest.mark.integration`: Integration tests1328- `@pytest.mark.slow`: Long-running tests13291330---13311332## Monitoring13331334OpenZIM MCP provides built-in monitoring capabilities:13351336- **Health Checks**: Server health and status monitoring1337- **Cache Metrics**: Cache hit rates and performance statistics1338- **Structured Logging**: JSON-formatted logs for easy parsing1339- **Error Tracking**: Comprehensive error logging and tracking13401341---13421343## Versioning13441345This project uses [Semantic Versioning](https://semver.org/) with automated version management through [release-please](https://github.com/googleapis/release-please).13461347### Automated Releases13481349Version bumps and releases are automated based on [Conventional Commits](https://www.conventionalcommits.org/):13501351- **`feat:`** - New features (minor version bump)1352- **`fix:`** - Bug fixes (patch version bump)1353- **`feat!:`** or **`BREAKING CHANGE:`** - Breaking changes (major version bump)1354- **`perf:`** - Performance improvements (patch version bump)1355- **`docs:`**, **`style:`**, **`refactor:`**, **`test:`**, **`chore:`** - No version bump13561357### Release Process13581359The project uses an **improved, consolidated release system** with automatic validation:136013611. **Automatic** (Recommended): Push conventional commits โ Release Please creates PR โ Merge PR โ Automatic release13622. **Manual**: Use GitHub Actions UI for direct control over releases13633. **Emergency**: Push tags directly for critical fixes13641365**Key Features:**13661367- **Zero-touch releases** from main branch1368- **Automatic version synchronization** validation1369- **Comprehensive testing** before every release1370- **Improved error handling** and rollback capabilities1371- **Branch protection** prevents broken releases13721373For detailed instructions, see [Release Process Guide](docs/RELEASE_PROCESS_GUIDE.md).13741375### Commit Message Format13761377```1378<type>[optional scope]: <description>13791380[optional body]13811382[optional footer(s)]1383```13841385**Examples:**13861387```bash1388feat: add search suggestions endpoint1389fix: resolve path traversal vulnerability1390feat!: change API response format1391docs: update installation instructions1392```13931394---13951396## Contributing139713981. Fork the repository13992. Create a feature branch (`git checkout -b feature/amazing-feature`)14003. Make your changes14014. Run tests (`make check`)14025. **Use conventional commit messages** (`git commit -m 'feat: add amazing feature'`)14036. Push to the branch (`git push origin feature/amazing-feature`)14047. Open a Pull Request14051406### Development Guidelines14071408- Follow PEP 8 style guidelines1409- Add type hints to all functions1410- Write tests for new functionality1411- Update documentation as needed1412- **Use conventional commit messages** for automatic versioning1413- Ensure all tests pass before submitting14141415---14161417## License14181419This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.14201421---14221423## Acknowledgments14241425- [Kiwix](https://www.kiwix.org/) for the ZIM format and libzim library1426- [MCP](https://modelcontextprotocol.io/) for the Model Context Protocol1427- The open-source community for the excellent libraries used in this project14281429---14301431Made with โค๏ธ by [Cameron Rye](https://rye.dev)1432
Full transparency โ inspect the skill content before installing.