Scrape, crawl, search, and extract web data. Converts any website to LLM-ready markdown. Local-first with smart escalation (HTTP → browser → stealth). No API key needed.
Add this skill
npx mdskills install webpeel/webpeelComprehensive web scraping with smart escalation, MCP integration, and extensive multi-platform support
1---2name: webpeel3description: Scrape, crawl, search, and extract web data. Converts any website to LLM-ready markdown. Local-first with smart escalation (HTTP → browser → stealth). No API key needed.4---56# WebPeel — Web Fetching for AI Agents78WebPeel converts any website into clean, LLM-ready markdown. It handles JavaScript rendering, anti-bot protection, and content extraction automatically.910## When to Use1112Use WebPeel when you need to:13- Fetch a web page and get clean markdown content14- Search the web and get full page content from results15- Crawl an entire site or discover all URLs16- Extract structured data from pages17- Get screenshots of web pages18- Track changes on a page over time19- Extract branding/design system from a site2021## Quick Reference2223### CLI (installed globally or via npx)2425```bash26# Install27npm install -g webpeel2829# Scrape a page (default: markdown output)30npx webpeel https://example.com3132# Search the web33npx webpeel search "latest AI news"3435# Crawl a site (up to 10 pages)36npx webpeel crawl https://example.com --limit 103738# Discover all URLs on a site39npx webpeel map https://example.com4041# Extract structured data42npx webpeel https://example.com --extract '{"title": "string", "price": "number"}'4344# Use browser rendering for JS-heavy sites45npx webpeel https://example.com --render4647# Use stealth mode for protected sites48npx webpeel https://example.com --stealth4950# Get screenshot51npx webpeel https://example.com --screenshot5253# AI-powered research agent54npx webpeel agent "Find the pricing of Notion" --llm-key sk-...5556# Filter content by HTML tags57npx webpeel https://example.com --include-tags article,main --exclude-tags nav,footer5859# Extract images60npx webpeel https://example.com --images6162# Limit token output63npx webpeel https://example.com --max-tokens 40006465# Get branding/design info66npx webpeel brand https://example.com6768# Track changes over time69npx webpeel track https://example.com70```7172### Node.js Library7374```typescript75import { peel, crawl, mapDomain, extractBranding, runAgent } from 'webpeel';7677// Scrape a page78const result = await peel('https://example.com');79console.log(result.content); // Clean markdown80console.log(result.metadata); // { title, description, language, ... }8182// Scrape with options83const result2 = await peel('https://example.com', {84 render: true, // Use browser for JS sites85 stealth: true, // Anti-bot stealth mode86 screenshot: true, // Capture screenshot87 format: 'markdown', // 'markdown' | 'text' | 'html'88 selector: 'article', // CSS selector for content89 includeTags: ['main'], // Only include these HTML tags90 excludeTags: ['nav'], // Remove these HTML tags91 maxTokens: 4000, // Limit output tokens92 images: true, // Extract image URLs93});9495// Search the web96import { search } from 'webpeel'; // Note: search is a CLI/API feature97// Use the API: GET https://api.webpeel.dev/v1/search?q=query9899// Crawl a site100const pages = await crawl('https://example.com', {101 limit: 20,102 maxDepth: 3,103 onProgress: (p) => console.log(`${p.completed}/${p.total}`),104});105106// Discover URLs107const urls = await mapDomain('https://example.com', { limit: 100 });108109// Extract branding110const brand = await extractBranding('https://example.com');111console.log(brand.colors, brand.fonts, brand.logo);112113// AI agent research114const research = await runAgent({115 prompt: 'Find Notion pricing plans',116 llmApiKey: 'sk-...',117 llmModel: 'gpt-4o',118});119```120121### Python SDK122123```python124from webpeel import WebPeel125126client = WebPeel(api_key="wp_...") # Or no key for local usage127128# Scrape129result = client.scrape("https://example.com")130print(result.markdown)131132# Search133results = client.search("AI frameworks comparison")134135# Crawl136pages = client.crawl("https://example.com", limit=10)137```138139### MCP Server140141WebPeel includes a built-in MCP server with 7 tools:142143```bash144# Start MCP server145npx webpeel mcp146```147148**Tools available:**149- `webpeel_fetch` — Fetch a URL and get markdown/text/HTML150- `webpeel_search` — Search the web via DuckDuckGo151- `webpeel_crawl` — Crawl a website and get all pages152- `webpeel_map` — Discover all URLs on a domain153- `webpeel_extract` — Extract structured data from a page154- `webpeel_batch` — Batch scrape multiple URLs155- `webpeel_agent` — AI-powered web research agent156157**MCP configuration for Claude Desktop / other clients:**158```json159{160 "mcpServers": {161 "webpeel": {162 "command": "npx",163 "args": ["webpeel", "mcp"]164 }165 }166}167```168169### Hosted API170171Free tier available at `https://api.webpeel.dev`:172173```bash174# Scrape (no auth for basic requests)175curl "https://api.webpeel.dev/v1/fetch?url=https://example.com"176177# Search178curl "https://api.webpeel.dev/v1/search?q=latest+news"179180# With API key for higher limits181curl -H "Authorization: Bearer wp_..." "https://api.webpeel.dev/v1/fetch?url=https://example.com&render=true"182```183184## Key Features185186- **Smart Escalation**: Automatically tries HTTP first, then browser, then stealth mode187- **No API Key Needed**: Works locally without any configuration188- **Token Efficient**: Smart content extraction saves ~96% tokens vs raw HTML189- **Stealth Mode**: Bypasses anti-bot protection on protected sites190- **Screenshot**: Full-page or viewport screenshots191- **Structured Extraction**: Extract JSON data using CSS selectors or AI192- **Change Tracking**: Track page changes over time with diffs193- **Branding Extraction**: Get colors, fonts, logos from any site194195## Tips196197- Use `--render` only when needed (JS-heavy sites). Simple HTTP is 5-10x faster.198- Use `--stealth` for sites that block bots (Cloudflare, etc.)199- Use `--max-tokens 4000` to keep output within context limits200- Use `--include-tags article,main` to extract only relevant content201- For batch operations, use `npx webpeel batch urls.txt`202- The MCP server is the easiest way to integrate with AI agents203204## Links205206- **GitHub**: https://github.com/webpeel/webpeel207- **npm**: https://www.npmjs.com/package/webpeel208- **PyPI**: https://pypi.org/project/webpeel/209- **Docs**: https://webpeel.dev/docs/210- **API**: https://api.webpeel.dev211
Full transparency — inspect the skill content before installing.