Português | MCP server for fetch web page content using Playwright headless browser. - JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications. - Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing
Add this skill
npx mdskills install jae-jae/fetcher-mcpWell-documented web scraping MCP with Playwright, intelligent content extraction, and parallel fetching
1<div align="center">2 <img src="https://raw.githubusercontent.com/jae-jae/fetcher-mcp/refs/heads/main/icon.svg" width="100" height="100" alt="Fetcher MCP Icon" />3</div>45[中文](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=zh) |6[Deutsch](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=de) |7[Español](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=es) |8[français](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=fr) |9[日本語](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=ja) |10[한국어](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=ko) |11[Português](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=pt) |12[Русский](https://www.readme-i18n.com/jae-jae/fetcher-mcp?lang=ru)1314# Fetcher MCP1516MCP server for fetch web page content using Playwright headless browser.1718> 🌟 **Recommended**: [OllaMan](https://ollaman.com/) - Powerful Ollama AI Model Manager.1920## Advantages2122- **JavaScript Support**: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.2324- **Intelligent Content Extraction**: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.2526- **Flexible Output Format**: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.2728- **Parallel Processing**: The `fetch_urls` tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.2930- **Resource Optimization**: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.3132- **Robust Error Handling**: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.3334- **Configurable Parameters**: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.3536## Quick Start3738Run directly with npx:3940```bash41npx -y fetcher-mcp42```4344First time setup - install the required browser by running the following command in your terminal:4546```bash47npx playwright install chromium48```4950### HTTP and SSE Transport5152Use the `--transport=http` parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously:5354```bash55npx -y fetcher-mcp --log --transport=http --host=0.0.0.0 --port=300056```5758After startup, the server provides the following endpoints:5960- `/mcp` - Streamable HTTP endpoint (modern MCP protocol)61- `/sse` - SSE endpoint (legacy MCP protocol)6263Clients can choose which method to connect based on their needs.6465### Debug Mode6667Run with the `--debug` option to show the browser window for debugging:6869```bash70npx -y fetcher-mcp --debug71```7273## Configuration MCP7475Configure this MCP server in Claude Desktop:7677On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`7879On Windows: `%APPDATA%/Claude/claude_desktop_config.json`8081```json82{83 "mcpServers": {84 "fetcher": {85 "command": "npx",86 "args": ["-y", "fetcher-mcp"]87 }88 }89}90```9192## Docker Deployment9394### Running with Docker9596```bash97docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest98```99100### Deploying with Docker Compose101102Create a `docker-compose.yml` file:103104```yaml105version: "3.8"106107services:108 fetcher-mcp:109 image: ghcr.io/jae-jae/fetcher-mcp:latest110 container_name: fetcher-mcp111 restart: unless-stopped112 ports:113 - "3000:3000"114 environment:115 - NODE_ENV=production116 # Using host network mode on Linux hosts can improve browser access efficiency117 # network_mode: "host"118 volumes:119 # For Playwright, may need to share certain system paths120 - /tmp:/tmp121 # Health check122 healthcheck:123 test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000"]124 interval: 30s125 timeout: 10s126 retries: 3127```128129Then run:130131```bash132docker-compose up -d133```134135## Features136137- `fetch_url` - Retrieve web page content from a specified URL138139 - Uses Playwright headless browser to parse JavaScript140 - Supports intelligent extraction of main content and conversion to Markdown141 - Supports the following parameters:142 - `url`: The URL of the web page to fetch (required parameter)143 - `timeout`: Page loading timeout in milliseconds, default is 30000 (30 seconds)144 - `waitUntil`: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'145 - `extractContent`: Whether to intelligently extract the main content, default is true146 - `maxLength`: Maximum length of returned content (in characters), default is no limit147 - `returnHtml`: Whether to return HTML content instead of Markdown, default is false148 - `waitForNavigation`: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false149 - `navigationTimeout`: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)150 - `disableMedia`: Whether to disable media resources (images, stylesheets, fonts, media), default is true151 - `debug`: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified152153- `fetch_urls` - Batch retrieve web page content from multiple URLs in parallel154 - Uses multi-tab parallel fetching for improved performance155 - Returns combined results with clear separation between webpages156 - Supports the following parameters:157 - `urls`: Array of URLs to fetch (required parameter)158 - Other parameters are the same as `fetch_url`159160- `browser_install` - Install Playwright Chromium browser binary automatically161162 - Installs required Chromium browser binary when not available163 - Automatically suggested when browser installation errors occur164 - Supports the following parameters:165 - `withDeps`: Install system dependencies required by Chromium browser, default is false166 - `force`: Force installation even if Chromium is already installed, default is false167168## Tips169170### Handling Special Website Scenarios171172#### Dealing with Anti-Crawler Mechanisms173174- **Wait for Complete Loading**: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:175176 ```177 Please wait for the page to fully load178 ```179180 This will use the `waitForNavigation: true` parameter.181182- **Increase Timeout Duration**: For websites that load slowly:183 ```184 Please set the page loading timeout to 60 seconds185 ```186 This adjusts both `timeout` and `navigationTimeout` parameters accordingly.187188#### Content Retrieval Adjustments189190- **Preserve Original HTML Structure**: When content extraction might fail:191192 ```193 Please preserve the original HTML content194 ```195196 Sets `extractContent: false` and `returnHtml: true`.197198- **Fetch Complete Page Content**: When extracted content is too limited:199200 ```201 Please fetch the complete webpage content instead of just the main content202 ```203204 Sets `extractContent: false`.205206- **Return Content as HTML**: When HTML format is needed instead of default Markdown:207 ```208 Please return the content in HTML format209 ```210 Sets `returnHtml: true`.211212### Debugging and Authentication213214#### Enabling Debug Mode215216- **Dynamic Debug Activation**: To display the browser window during a specific fetch operation:217 ```218 Please enable debug mode for this fetch operation219 ```220 This sets `debug: true` even if the server was started without the `--debug` flag.221222#### Using Custom Cookies for Authentication223224- **Manual Login**: To login using your own credentials:225226 ```227 Please run in debug mode so I can manually log in to the website228 ```229230 Sets `debug: true` or uses the `--debug` flag, keeping the browser window open for manual login.231232- **Interacting with Debug Browser**: When debug mode is enabled:233234 1. The browser window remains open235 2. You can manually log into the website using your credentials236 3. After login is complete, content will be fetched with your authenticated session237238- **Enable Debug for Specific Requests**: Even if the server is already running, you can enable debug mode for a specific request:239 ```240 Please enable debug mode for this authentication step241 ```242 Sets `debug: true` for this specific request only, opening the browser window for manual login.243244## Development245246### Install Dependencies247248```bash249npm install250```251252### Install Playwright Browser253254Install the browsers needed for Playwright:255256```bash257npm run install-browser258```259260### Build the Server261262```bash263npm run build264```265266## Debugging267268Use MCP Inspector for debugging:269270```bash271npm run inspector272```273274You can also enable visible browser mode for debugging:275276```bash277node build/index.js --debug278```279280## Related Projects281282- [g-search-mcp](https://github.com/jae-jae/g-search-mcp): A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.283284## License285286Licensed under the [MIT License](https://choosealicense.com/licenses/mit/)287288[](https://dartnode.com "Powered by DartNode - Free VPS for Open Source")289
Full transparency — inspect the skill content before installing.