A powerful MCP (Model Context Protocol) server for intelligent web content analysis and summarization. Built with FastMCP, this server provides smart web scraping, content extraction, and AI-powered question-answering capabilities. 1. urltomarkdown - Extract and summarize key web page content - Analyzes content importance using custom algorithms - Removes ads, navigation, and irrelevant content -
Add this skill
npx mdskills install kimdonghwi94/web-analyzer-mcpWell-documented MCP server with two useful web extraction tools and comprehensive setup instructions
1# ๐ Web Analyzer MCP23<a href="https://glama.ai/mcp/servers/@kimdonghwi94/web-analyzer-mcp">4 <img width="380" height="200" src="https://glama.ai/mcp/servers/@kimdonghwi94/web-analyzer-mcp/badge" alt="WebAnalyzer MCP server" />5</a>67A powerful MCP (Model Context Protocol) server for intelligent web content analysis and summarization. Built with FastMCP, this server provides smart web scraping, content extraction, and AI-powered question-answering capabilities.89## โจ Features1011### ๐ฏ Core Tools12131. **`url_to_markdown`** - Extract and summarize key web page content14 - Analyzes content importance using custom algorithms15 - Removes ads, navigation, and irrelevant content16 - Keeps only essential information (tables, images, key text)17 - Outputs structured markdown optimized for analysis18192. **`web_content_qna`** - AI-powered Q&A about web content20 - Extracts relevant content sections from web pages21 - Uses intelligent chunking and relevance matching22 - Answers questions using OpenAI GPT models2324### ๐ Key Features2526- **Smart Content Ranking**: Algorithm-based content importance scoring27- **Essential Content Only**: Removes clutter, keeps what matters28- **Multi-IDE Support**: Works with Claude Desktop, Cursor, VS Code, PyCharm29- **Flexible Models**: Choose from GPT-3.5, GPT-4, GPT-4 Turbo, or GPT-53031## ๐ฆ Installation3233### Prerequisites34- [uv](https://docs.astral.sh/uv/getting-started/installation/) (Python package manager)35- Chrome/Chromium browser (for Selenium)36- OpenAI API key (for Q&A functionality)3738### ๐ Quick Start with uv (Recommended)3940```bash41# Clone the repository42git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git43cd web-analyzer-mcp4445# Run directly with uv (auto-installs dependencies)46uv run mcp-webanalyzer47```4849### Installing via Smithery5051To install web-analyzer-mcp for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@kimdonghwi94/web-analyzer-mcp):5253```bash54npx -y @smithery/cli install @kimdonghwi94/web-analyzer-mcp --client claude55```5657# IDE/Editor Integration5859<details>60<summary><b>Install Claude Desktop</b></summary>6162Add to your Claude Desktop_config.json file. See [Claude Desktop MCP documentation](https://modelcontextprotocol.io/quickstart/user) for more details.6364```json65{66 "mcpServers": {67 "web-analyzer": {68 "command": "uv",69 "args": [70 "--directory",71 "/path/to/web-analyzer-mcp",72 "run",73 "mcp-webanalyzer"74 ],75 "env": {76 "OPENAI_API_KEY": "your_openai_api_key_here",77 "OPENAI_MODEL": "gpt-4"78 }79 }80 }81}82```8384</details>8586<details>87<summary><b>Install Claude Code (VS Code Extension)</b></summary>8889Add the server using Claude Code CLI:9091```bash92claude mcp add web-analyzer -e OPENAI_API_KEY=your_api_key_here -e OPENAI_MODEL=gpt-4 -- uv --directory /path/to/web-analyzer-mcp run mcp-webanalyzer93```94</details>9596<details>97<summary><b>Install Cursor IDE</b></summary>9899Add to your Cursor settings (`File > Preferences > Settings > Extensions > MCP`):100101```json102{103 "mcpServers": {104 "web-analyzer": {105 "command": "uv",106 "args": [107 "--directory",108 "/path/to/web-analyzer-mcp",109 "run",110 "mcp-webanalyzer"111 ],112 "env": {113 "OPENAI_API_KEY": "your_openai_api_key_here",114 "OPENAI_MODEL": "gpt-4"115 }116 }117 }118}119```120</details>121122<details>123<summary><b>Install JetBrains AI Assistant</b></summary>124125See [JetBrains AI Assistant Documentation](https://www.jetbrains.com/help/idea/ai-assistant.html) for more details.1261271. In JetBrains IDEs go to **Settings** โ **Tools** โ **AI Assistant** โ **Model Context Protocol (MCP)**1282. Click **+ Add**1293. Click on **Command** in the top-left corner of the dialog and select the **As JSON** option from the list1304. Add this configuration and click **OK**:131132```json133{134 "mcpServers": {135 "web-analyzer": {136 "command": "uv",137 "args": [138 "--directory",139 "/path/to/web-analyzer-mcp",140 "run",141 "mcp-webanalyzer"142 ],143 "env": {144 "OPENAI_API_KEY": "your_openai_api_key_here",145 "OPENAI_MODEL": "gpt-4"146 }147 }148 }149}150```151</details>152153## ๐๏ธ Tool Descriptions154155### `url_to_markdown`156Converts web pages to clean markdown format with essential content extraction.157158**Parameters:**159- `url` (string): The web page URL to analyze160161**Returns:** Clean markdown content with structured data preservation162163### `web_content_qna`164Answers questions about web page content using intelligent content analysis.165166**Parameters:**167- `url` (string): The web page URL to analyze168- `question` (string): Question about the page content169170**Returns:** AI-generated answer based on page content171172## ๐๏ธ Architecture173174### Content Extraction Pipeline1751761. **URL Validation** - Ensures proper URL format1772. **HTML Fetching** - Uses Selenium for dynamic content1783. **Content Parsing** - BeautifulSoup for HTML processing1794. **Element Scoring** - Custom algorithm ranks content importance1805. **Content Filtering** - Removes duplicates and low-value content1816. **Markdown Conversion** - Structured output generation182183### Q&A Processing Pipeline1841851. **Content Chunking** - Intelligent text segmentation1862. **Relevance Scoring** - Matches content to questions1873. **Context Selection** - Picks most relevant chunks1884. **Answer Generation** - OpenAI GPT integration189190## ๐๏ธ Project Structure191192```193web-analyzer-mcp/194โโโ web_analyzer_mcp/ # Main Python package195โ โโโ __init__.py # Package initialization196โ โโโ server.py # FastMCP server with tools197โ โโโ web_extractor.py # Web content extraction engine198โ โโโ rag_processor.py # RAG-based Q&A processor199โโโ scripts/ # Build and utility scripts200โ โโโ build.js # Node.js build script201โโโ README.md # English documentation202โโโ README.ko.md # Korean documentation203โโโ package.json # npm configuration and scripts204โโโ pyproject.toml # Python package configuration205โโโ .env.example # Environment variables template206โโโ dist-info.json # Build information (generated)207```208209## ๐ ๏ธ Development210211### Modern Development with uv212213```bash214# Clone repository215git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git216cd web-analyzer-mcp217218# Development commands219uv run mcp-webanalyzer # Start development server220uv run python -m pytest # Run tests221uv run ruff check . # Lint code222uv run ruff format . # Format code223uv sync # Sync dependencies224225# Install development dependencies226uv add --dev pytest ruff mypy227228# Create production build229npm run build230```231232### Alternative: Traditional Python Development233234```bash235# Setup Python environment (if not using uv)236pip install -e .[dev]237238# Development commands239python -m web_analyzer_mcp.server # Start server240python -m pytest tests/ # Run tests241python -m ruff check . # Lint code242python -m ruff format . # Format code243python -m mypy web_analyzer_mcp/ # Type checking244```245246## ๐ค Contributing2472481. Fork the repository2492. Create a feature branch (`git checkout -b feature/amazing-feature`)2503. Commit your changes (`git commit -m 'Add amazing feature'`)2514. Push to the branch (`git push origin feature/amazing-feature`)2525. Open a Pull Request253254## ๐ Roadmap255256- [ ] Support for more content types (PDFs, videos)257- [ ] Multi-language content extraction258- [ ] Custom extraction rules259- [ ] Caching for frequently accessed content260- [ ] Webhook support for real-time updates261262## โ ๏ธ Limitations263264- Requires Chrome/Chromium for JavaScript-heavy sites265- OpenAI API key needed for Q&A functionality266- Rate limited to prevent abuse267- Some sites may block automated access268269## ๐ License270271This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.272273## ๐โโ๏ธ Support274275- Create an issue for bug reports or feature requests276- Contribute to discussions in the GitHub repository277- Check the [documentation](https://github.com/kimdonghwi94/web-analyzer-mcp) for detailed guides278279## ๐ Acknowledgments280281- Built with [FastMCP](https://github.com/jlowin/fastmcp) framework282- Inspired by [HTMLRAG](https://github.com/plageon/HtmlRAG) techniques for web content processing283- Thanks to the MCP community for feedback and contributions284285---286287**Made with โค๏ธ for the MCP community**
Full transparency โ inspect the skill content before installing.