How do I install Finding Duplicate Functions?

Install Finding Duplicate Functions with a single command: npx mdskills install obra/finding-duplicate-functions. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Finding Duplicate Functions?

Finding Duplicate Functions works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

Finding Duplicate Functions

Name: Finding Duplicate Functions: AI Agent Skill
Brand: obra
Availability: InStock
Rating: 8 (1 reviews)
Author: obra

Verified

Git & WorkflowIntermediate

Use when auditing a codebase for semantic duplication - functions that do the same thing but have different names or implementations. Especially useful for LLM-generated codebases where new functions are often created rather than reusing existing ones.

by @obra2 downloads192Updated 2/24/2026

Add this skill

npx mdskills install obra/finding-duplicate-functions

Fork & Edit

Are you @obra? Sign in with GitHub to claim this listing.

Skill Advisor8.0

Well-structured multi-phase workflow for semantic duplicate detection with clear tooling and models

+Provides clear phase-by-phase workflow with specific model recommendations
+Includes practical high-risk zone guidance and common mistake warnings
+Documents concrete tooling with command examples and output expectations
-References external scripts not included in the skill content itself
-Lacks inline examples of actual duplicate patterns or sample outputs

SKILL.md

Edit in Browser

1---
2name: finding-duplicate-functions
3description: Use when auditing a codebase for semantic duplication - functions that do the same thing but have different names or implementations. Especially useful for LLM-generated codebases where new functions are often created rather than reusing existing ones.
4---
5 
6# Finding Duplicate-Intent Functions
7 
8## Overview
9 
10LLM-generated codebases accumulate semantic duplicates: functions that serve the same purpose but were implemented independently. Classical copy-paste detectors (jscpd) find syntactic duplicates but miss "same intent, different implementation."
11 
12This skill uses a two-phase approach: classical extraction followed by LLM-powered intent clustering.
13 
14## When to Use
15 
16- Codebase has grown organically with multiple contributors (human or LLM)
17- You suspect utility functions have been reimplemented multiple times
18- Before major refactoring to identify consolidation opportunities
19- After jscpd has been run and syntactic duplicates are already handled
20 
21## Quick Reference
22 
23| Phase | Tool | Model | Output |
24|-------|------|-------|--------|
25| 1. Extract | `scripts/extract-functions.sh` | - | `catalog.json` |
26| 2. Categorize | `scripts/categorize-prompt.md` | haiku | `categorized.json` |
27| 3. Split | `scripts/prepare-category-analysis.sh` | - | `categories/*.json` |
28| 4. Detect | `scripts/find-duplicates-prompt.md` | opus | `duplicates/*.json` |
29| 5. Report | `scripts/generate-report.sh` | - | `report.md` |
30 
31## Process
32 
33```dot
34digraph duplicate_detection {
35  rankdir=TB;
36  node [shape=box];
37 
38  extract [label="1. Extract function catalog\n./scripts/extract-functions.sh"];
39  categorize [label="2. Categorize by domain\n(haiku subagent)"];
40  split [label="3. Split into categories\n./scripts/prepare-category-analysis.sh"];
41  detect [label="4. Find duplicates per category\n(opus subagent per category)"];
42  report [label="5. Generate report\n./scripts/generate-report.sh"];
43  review [label="6. Human review & consolidate"];
44 
45  extract -> categorize -> split -> detect -> report -> review;
46}
47```
48 
49### Phase 1: Extract Function Catalog
50 
51```bash
52./scripts/extract-functions.sh src/ -o catalog.json
53```
54 
55Options:
56- `-o FILE`: Output file (default: stdout)
57- `-c N`: Lines of context to capture (default: 15)
58- `-t GLOB`: File types (default: `*.ts,*.tsx,*.js,*.jsx`)
59- `--include-tests`: Include test files (excluded by default)
60 
61Test files (`*.test.*`, `*.spec.*`, `__tests__/**`) are excluded by default since test utilities are less likely to be consolidation candidates.
62 
63### Phase 2: Categorize by Domain
64 
65Dispatch a **haiku** subagent using the prompt in `scripts/categorize-prompt.md`.
66 
67Insert the contents of `catalog.json` where indicated in the prompt template. Save output as `categorized.json`.
68 
69### Phase 3: Split into Categories
70 
71```bash
72./scripts/prepare-category-analysis.sh categorized.json ./categories
73```
74 
75Creates one JSON file per category. Only categories with 3+ functions are worth analyzing.
76 
77### Phase 4: Find Duplicates (Per Category)
78 
79For each category file in `./categories/`, dispatch an **opus** subagent using the prompt in `scripts/find-duplicates-prompt.md`.
80 
81Save each output as `./duplicates/{category}.json`.
82 
83### Phase 5: Generate Report
84 
85```bash
86./scripts/generate-report.sh ./duplicates ./duplicates-report.md
87```
88 
89Produces a prioritized markdown report grouped by confidence level.
90 
91### Phase 6: Human Review
92 
93Review the report. For HIGH confidence duplicates:
941. Verify the recommended survivor has tests
952. Update callers to use the survivor
963. Delete the duplicates
974. Run tests
98 
99## High-Risk Duplicate Zones
100 
101Focus extraction on these areas first - they accumulate duplicates fastest:
102 
103| Zone | Common Duplicates |
104|------|-------------------|
105| `utils/`, `helpers/`, `lib/` | General utilities reimplemented |
106| Validation code | Same checks written multiple ways |
107| Error formatting | Error-to-string conversions |
108| Path manipulation | Joining, resolving, normalizing paths |
109| String formatting | Case conversion, truncation, escaping |
110| Date formatting | Same formats implemented repeatedly |
111| API response shaping | Similar transformations for different endpoints |
112 
113## Common Mistakes
114 
115**Extracting too much**: Focus on exported functions and public methods. Internal helpers are less likely to be duplicated across files.
116 
117**Skipping the categorization step**: Going straight to duplicate detection on the full catalog produces noise. Categories focus the comparison.
118 
119**Using haiku for duplicate detection**: Haiku is cost-effective for categorization but misses subtle semantic duplicates. Use Opus for the actual duplicate analysis.
120 
121**Consolidating without tests**: Before deleting duplicates, ensure the survivor has tests covering all use cases of the deleted functions.
122

Full transparency — inspect the skill content before installing.

New to skill.md files?

See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.

Read the guide →