Use when auditing a codebase for semantic duplication - functions that do the same thing but have different names or implementations. Especially useful for LLM-generated codebases where new functions are often created rather than reusing existing ones.
Add this skill
npx mdskills install obra/finding-duplicate-functionsWell-structured multi-phase workflow for semantic duplicate detection with clear tooling and models
1---2name: finding-duplicate-functions3description: Use when auditing a codebase for semantic duplication - functions that do the same thing but have different names or implementations. Especially useful for LLM-generated codebases where new functions are often created rather than reusing existing ones.4---56# Finding Duplicate-Intent Functions78## Overview910LLM-generated codebases accumulate semantic duplicates: functions that serve the same purpose but were implemented independently. Classical copy-paste detectors (jscpd) find syntactic duplicates but miss "same intent, different implementation."1112This skill uses a two-phase approach: classical extraction followed by LLM-powered intent clustering.1314## When to Use1516- Codebase has grown organically with multiple contributors (human or LLM)17- You suspect utility functions have been reimplemented multiple times18- Before major refactoring to identify consolidation opportunities19- After jscpd has been run and syntactic duplicates are already handled2021## Quick Reference2223| Phase | Tool | Model | Output |24|-------|------|-------|--------|25| 1. Extract | `scripts/extract-functions.sh` | - | `catalog.json` |26| 2. Categorize | `scripts/categorize-prompt.md` | haiku | `categorized.json` |27| 3. Split | `scripts/prepare-category-analysis.sh` | - | `categories/*.json` |28| 4. Detect | `scripts/find-duplicates-prompt.md` | opus | `duplicates/*.json` |29| 5. Report | `scripts/generate-report.sh` | - | `report.md` |3031## Process3233```dot34digraph duplicate_detection {35 rankdir=TB;36 node [shape=box];3738 extract [label="1. Extract function catalog\n./scripts/extract-functions.sh"];39 categorize [label="2. Categorize by domain\n(haiku subagent)"];40 split [label="3. Split into categories\n./scripts/prepare-category-analysis.sh"];41 detect [label="4. Find duplicates per category\n(opus subagent per category)"];42 report [label="5. Generate report\n./scripts/generate-report.sh"];43 review [label="6. Human review & consolidate"];4445 extract -> categorize -> split -> detect -> report -> review;46}47```4849### Phase 1: Extract Function Catalog5051```bash52./scripts/extract-functions.sh src/ -o catalog.json53```5455Options:56- `-o FILE`: Output file (default: stdout)57- `-c N`: Lines of context to capture (default: 15)58- `-t GLOB`: File types (default: `*.ts,*.tsx,*.js,*.jsx`)59- `--include-tests`: Include test files (excluded by default)6061Test files (`*.test.*`, `*.spec.*`, `__tests__/**`) are excluded by default since test utilities are less likely to be consolidation candidates.6263### Phase 2: Categorize by Domain6465Dispatch a **haiku** subagent using the prompt in `scripts/categorize-prompt.md`.6667Insert the contents of `catalog.json` where indicated in the prompt template. Save output as `categorized.json`.6869### Phase 3: Split into Categories7071```bash72./scripts/prepare-category-analysis.sh categorized.json ./categories73```7475Creates one JSON file per category. Only categories with 3+ functions are worth analyzing.7677### Phase 4: Find Duplicates (Per Category)7879For each category file in `./categories/`, dispatch an **opus** subagent using the prompt in `scripts/find-duplicates-prompt.md`.8081Save each output as `./duplicates/{category}.json`.8283### Phase 5: Generate Report8485```bash86./scripts/generate-report.sh ./duplicates ./duplicates-report.md87```8889Produces a prioritized markdown report grouped by confidence level.9091### Phase 6: Human Review9293Review the report. For HIGH confidence duplicates:941. Verify the recommended survivor has tests952. Update callers to use the survivor963. Delete the duplicates974. Run tests9899## High-Risk Duplicate Zones100101Focus extraction on these areas first - they accumulate duplicates fastest:102103| Zone | Common Duplicates |104|------|-------------------|105| `utils/`, `helpers/`, `lib/` | General utilities reimplemented |106| Validation code | Same checks written multiple ways |107| Error formatting | Error-to-string conversions |108| Path manipulation | Joining, resolving, normalizing paths |109| String formatting | Case conversion, truncation, escaping |110| Date formatting | Same formats implemented repeatedly |111| API response shaping | Similar transformations for different endpoints |112113## Common Mistakes114115**Extracting too much**: Focus on exported functions and public methods. Internal helpers are less likely to be duplicated across files.116117**Skipping the categorization step**: Going straight to duplicate detection on the full catalog produces noise. Categories focus the comparison.118119**Using haiku for duplicate detection**: Haiku is cost-effective for categorization but misses subtle semantic duplicates. Use Opus for the actual duplicate analysis.120121**Consolidating without tests**: Before deleting duplicates, ensure the survivor has tests covering all use cases of the deleted functions.122
Full transparency โ inspect the skill content before installing.