Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.
Add this skill
npx mdskills install sickn33/daily-news-reportSophisticated multi-agent orchestration for curating technical news with smart caching and fallback modes
1---2name: daily-news-report3description: Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.4argument-hint: [optional: date]5disable-model-invocation: false6user-invocable: true7allowed-tools: Task, WebFetch, Read, Write, Bash(mkdir*), Bash(date*), Bash(ls*), mcp__chrome-devtools__*8---910# Daily News Report v3.01112> **Architecture Upgrade**: Main Agent Orchestration + SubAgent Execution + Browser Scraping + Smart Caching1314## Core Architecture1516```17┌─────────────────────────────────────────────────────────────────────┐18│ Main Agent (Orchestrator) │19│ Role: Scheduling, Monitoring, Evaluation, Decision, Aggregation │20├─────────────────────────────────────────────────────────────────────┤21│ │22│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │23│ │ 1. Init │ → │ 2. Dispatch │ → │ 3. Monitor │ → │ 4. Evaluate │ │24│ │ Read Config │ │ Assign Tasks│ │ Collect Res │ │ Filter/Sort │ │25│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │26│ │ │ │ │ │27│ ▼ ▼ ▼ ▼ │28│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │29│ │ 5. Decision │ ← │ Enough 20? │ │ 6. Generate │ → │ 7. Update │ │30│ │ Cont/Stop │ │ Y/N │ │ Report File │ │ Cache Stats │ │31│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │32│ │33└──────────────────────────────────────────────────────────────────────┘34 ↓ Dispatch ↑ Return Results35┌─────────────────────────────────────────────────────────────────────┐36│ SubAgent Execution Layer │37├─────────────────────────────────────────────────────────────────────┤38│ │39│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │40│ │ Worker A │ │ Worker B │ │ Browser │ │41│ │ (WebFetch) │ │ (WebFetch) │ │ (Headless) │ │42│ │ Tier1 Batch │ │ Tier2 Batch │ │ JS Render │ │43│ └─────────────┘ └─────────────┘ └─────────────┘ │44│ ↓ ↓ ↓ │45│ ┌─────────────────────────────────────────────────────────────┐ │46│ │ Structured Result Return │ │47│ │ { status, data: [...], errors: [...], metadata: {...} } │ │48│ └─────────────────────────────────────────────────────────────┘ │49│ │50└─────────────────────────────────────────────────────────────────────┘51```5253## Configuration Files5455This skill uses the following configuration files:5657| File | Purpose |58|------|---------|59| `sources.json` | Source configuration, priorities, scrape methods |60| `cache.json` | Cached data, historical stats, deduplication fingerprints |6162## Execution Process Details6364### Phase 1: Initialization6566```yaml67Steps:68 1. Determine date (user argument or current date)69 2. Read sources.json for source configurations70 3. Read cache.json for historical data71 4. Create output directory NewsReport/72 5. Check if a partial report exists for today (append mode)73```7475### Phase 2: Dispatch SubAgents7677**Strategy**: Parallel dispatch, batch execution, early stopping mechanism7879```yaml80Wave 1 (Parallel):81 - Worker A: Tier1 Batch A (HN, HuggingFace Papers)82 - Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)8384Wait for results → Evaluate count8586If < 15 high-quality items:87 Wave 2 (Parallel):88 - Worker C: Tier2 Batch A (James Clear, FS Blog)89 - Worker D: Tier2 Batch B (HackerNoon, Scott Young)9091If still < 20 items:92 Wave 3 (Browser):93 - Browser Worker: ProductHunt, Latent Space (Require JS rendering)94```9596### Phase 3: SubAgent Task Format9798Task format received by each SubAgent:99100```yaml101task: fetch_and_extract102sources:103 - id: hn104 url: https://news.ycombinator.com105 extract: top_10106 - id: hf_papers107 url: https://huggingface.co/papers108 extract: top_voted109110output_schema:111 items:112 - source_id: string # Source Identifier113 title: string # Title114 summary: string # 2-4 sentence summary115 key_points: string[] # Max 3 key points116 url: string # Original URL117 keywords: string[] # Keywords118 quality_score: 1-5 # Quality Score119120constraints:121 filter: "Cutting-edge Tech/Deep Tech/Productivity/Practical Info"122 exclude: "General Science/Marketing Puff/Overly Academic/Job Posts"123 max_items_per_source: 10124 skip_on_error: true125126return_format: JSON127```128129### Phase 4: Main Agent Monitoring & Feedback130131Main Agent Responsibilities:132133```yaml134Monitoring:135 - Check SubAgent return status (success/partial/failed)136 - Count collected items137 - Record success rate per source138139Feedback Loop:140 - If a SubAgent fails, decide whether to retry or skip141 - If a source fails persistently, mark as disabled142 - Dynamically adjust source selection for subsequent batches143144Decision:145 - Items >= 25 AND HighQuality >= 20 → Stop scraping146 - Items < 15 → Continue to next batch147 - All batches done but < 20 → Generate with available content (Quality over Quantity)148```149150### Phase 5: Evaluation & Filtering151152```yaml153Deduplication:154 - Exact URL match155 - Title similarity (>80% considered duplicate)156 - Check cache.json to avoid history duplicates157158Score Calibration:159 - Unify scoring standards across SubAgents160 - Adjust weights based on source credibility161 - Bonus points for manually curated high-quality sources162163Sorting:164 - Descending order by quality_score165 - Sort by source priority if scores are equal166 - Take Top 20167```168169### Phase 6: Browser Scraping (MCP Chrome DevTools)170171For pages requiring JS rendering, use a headless browser:172173```yaml174Process:175 1. Call mcp__chrome-devtools__new_page to open page176 2. Call mcp__chrome-devtools__wait_for to wait for content load177 3. Call mcp__chrome-devtools__take_snapshot to get page structure178 4. Parse snapshot to extract required content179 5. Call mcp__chrome-devtools__close_page to close page180181Applicable Scenarios:182 - ProductHunt (403 on WebFetch)183 - Latent Space (Substack JS rendering)184 - Other SPA applications185```186187### Phase 7: Generate Report188189```yaml190Output:191 - Directory: NewsReport/192 - Filename: YYYY-MM-DD-news-report.md193 - Format: Standard Markdown194195Content Structure:196 - Title + Date197 - Statistical Summary (Source count, items collected)198 - 20 High-Quality Items (Template based)199 - Generation Info (Version, Timestamps)200```201202### Phase 8: Update Cache203204```yaml205Update cache.json:206 - last_run: Record this run info207 - source_stats: Update stats per source208 - url_cache: Add processed URLs209 - content_hashes: Add content fingerprints210 - article_history: Record included articles211```212213## SubAgent Call Examples214215### Using general-purpose Agent216217Since custom agents require session restart to be discovered, use general-purpose and inject worker prompts:218219```220Task Call:221 subagent_type: general-purpose222 model: haiku223 prompt: |224 You are a stateless execution unit. Only do the assigned task and return structured JSON.225226 Task: Scrape the following URLs and extract content227228 URLs:229 - https://news.ycombinator.com (Extract Top 10)230 - https://huggingface.co/papers (Extract top voted papers)231232 Output Format:233 {234 "status": "success" | "partial" | "failed",235 "data": [236 {237 "source_id": "hn",238 "title": "...",239 "summary": "...",240 "key_points": ["...", "...", "..."],241 "url": "...",242 "keywords": ["...", "..."],243 "quality_score": 4244 }245 ],246 "errors": [],247 "metadata": { "processed": 2, "failed": 0 }248 }249250 Filter Criteria:251 - Keep: Cutting-edge Tech/Deep Tech/Productivity/Practical Info252 - Exclude: General Science/Marketing Puff/Overly Academic/Job Posts253254 Return JSON directly, no explanation.255```256257### Using worker Agent (Requires session restart)258259```260Task Call:261 subagent_type: worker262 prompt: |263 task: fetch_and_extract264 input:265 urls:266 - https://news.ycombinator.com267 - https://huggingface.co/papers268 output_schema:269 - source_id: string270 - title: string271 - summary: string272 - key_points: string[]273 - url: string274 - keywords: string[]275 - quality_score: 1-5276 constraints:277 filter: Cutting-edge Tech/Deep Tech/Productivity/Practical Info278 exclude: General Science/Marketing Puff/Overly Academic279```280281## Output Template282283```markdown284# Daily News Report (YYYY-MM-DD)285286> Curated from N sources today, containing 20 high-quality items287> Generation Time: X min | Version: v3.0288>289> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.290291---292293## 1. Title294295- **Summary**: 2-4 lines overview296- **Key Points**:297 1. Point one298 2. Point two299 3. Point three300- **Source**: [Link](URL)301- **Keywords**: `keyword1` `keyword2` `keyword3`302- **Score**: ⭐⭐⭐⭐⭐ (5/5)303304---305306## 2. Title307...308309---310311*Generated by Daily News Report v3.0*312*Sources: HN, HuggingFace, OneUsefulThing, ...*313```314315## Constraints & Principles3163171. **Quality over Quantity**: Low-quality content does not enter the report.3182. **Early Stop**: Stop scraping once 20 high-quality items are reached.3193. **Parallel First**: SubAgents in the same batch execute in parallel.3204. **Fault Tolerance**: Failure of a single source does not affect the whole process.3215. **Cache Reuse**: Avoid re-scraping the same content.3226. **Main Agent Control**: All decisions are made by the Main Agent.3237. **Fallback Awareness**: Detect sub-agent availability, gracefully degrade if unavailable.324325## Expected Performance326327| Scenario | Expected Time | Note |328|---|---|---|329| Optimal | ~2 mins | Tier1 sufficient, no browser needed |330| Normal | ~3-4 mins | Requires Tier2 supplement |331| Browser Needed | ~5-6 mins | Includes JS rendered pages |332333## Error Handling334335| Error Type | Handling |336|---|---|337| SubAgent Timeout | Log error, continue to next |338| Source 403/404 | Mark disabled, update sources.json |339| Extraction Failed | Return raw content, Main Agent decides |340| Browser Crash | Skip source, log entry |341342## Compatibility & Fallback343344To ensure usability across different Agent environments, the following checks must be performed:3453461. **Environment Check**:347 - In Phase 1 initialization, attempt to detect if `worker` sub-agent exists.348 - If not exists (or plugin not installed), automatically switch to **Serial Execution Mode**.3493502. **Serial Execution Mode**:351 - Do not use parallel block.352 - Main Agent executes scraping tasks for each source sequentially.353 - Slower, but guarantees basic functionality.3543553. **User Alert**:356 - MUST include a clear warning in the generated report header indicating the current degraded mode.357
Full transparency — inspect the skill content before installing.