AI-assisted coding is transforming how software gets built. But the data tells a sobering story: - 1.7x more bugs in AI-generated code compared to human-written code (CodeRabbit, 2025) - 40% of AI-generated code contains security vulnerabilities - 2026 is "the year of technical debt from vibe coding" (Forrester) The root cause is not the AI. It is the absence of engineering discipline between sess
Add this skill
npx mdskills install Vimalk0703/shipworthyComprehensive engineering framework enforcing 55+ production practices across security, testing, and architecture with auto-activation.
1# Shipworthy23> Vibe coding is how you start; engineering is what keeps it alive.45[](LICENSE)6[]()7[](BENCHMARKS.md)89---1011## The Problem1213AI-assisted coding is transforming how software gets built. But the data tells a sobering story:1415- **1.7x more bugs** in AI-generated code compared to human-written code (CodeRabbit, 2025)16- **40% of AI-generated code** contains security vulnerabilities17- **2026 is "the year of technical debt from vibe coding"** (Forrester)1819The root cause is not the AI. It is the **absence of engineering discipline between sessions.** The AI forgets your architecture the moment the session ends. It skips tests because you did not ask for them. It introduces security holes because nobody told it your auth strategy. Every new session starts from zero.2021You end up building the same feature three times: once to get it working, once to fix what it broke, and once more when you realize the fix broke something else.2223## The Solution2425**Shipworthy** is a Claude Code plugin that auto-activates every session and enforces production engineering practices with full transparency. It detects your project type, generates an architecture spec, and maintains it across sessions. You vibe code at full speed -- the plugin handles TDD, security, quality gates, and 55 engineering skills while showing you exactly what it's doing. No configuration, no ceremony, no workflow changes.2627## Install2829```bash30# Any AI agent (CLI setup — hooks + skills + quality gates)31npx shipworthy init3233# Specific agent34npx shipworthy init --agent cursor35npx shipworthy init --agent copilot36npx shipworthy init --agent codex37npx shipworthy init --agent windsurf38npx shipworthy init --agent gemini39```4041## Supported AI Agents4243| Agent | Setup | Hooks | Skills | Quality Gates |44|-------|-------|-------|--------|--------------|45| **Claude Code** | `npx shipworthy init` | Full | Full (55) | Automated |46| **Cursor** | `npx shipworthy init --agent cursor` | Rules | Full | Manual |47| **GitHub Copilot** | `npx shipworthy init --agent copilot` | Rules | Full | Manual |48| **OpenAI Codex** | `npx shipworthy init --agent codex` | Rules | Full | Manual |49| **Windsurf** | `npx shipworthy init --agent windsurf` | Rules | Full | Manual |50| **Gemini CLI** | `npx shipworthy init --agent gemini` | Rules | Full | Manual |5152## What Happens In Your First Session5354That is the only setup. Here is what happens next:55561. **You open Claude Code on your project.** The plugin fires its session-start hook automatically.572. **It detects your tech stack.** Next.js? Express? FastAPI? Go? React? Python? It knows.583. **It generates an architecture spec.** A file at `.shipworthy/architecture.md` captures your project's conventions, mandatory rules, and structure.594. **From now on, every session enforces those rules.** Claude remembers your architecture, your naming conventions, your patterns -- permanently.605. **You build features normally.** Say "add a payment endpoint" and Claude automatically applies API design standards, security-first development, TDD, and quality gates. You never asked it to. It just does.616. **Before completing, it verifies.** Tests pass, no secrets leaked, no regressions, build is clean. Evidence, not claims.6263## Four Pillars6465### 1. Invisible Discipline66Engineering guardrails activate automatically based on what you are doing. Writing a new feature triggers brainstorming, then planning, then TDD. Creating an API endpoint activates API design standards and security. You never invoke these manually -- they fire when relevant and stay silent when they are not.6768### 2. Full Transparency69Every Shipworthy action is visible. Hooks log color-coded activity to your terminal in real time — security scans, compliance checks, push validation. Skills announce themselves before activating. Commands, agents, templates, and adapters all identify when they're contributing. You always know what Shipworthy is doing and why.7071```72┌─ ⚓ shipworthy ─────────────────────────────┐73│ Tier: ENGINEER │ Health: all passed │74│ Skills: 55 │ Hooks: 6 active │75└──────────────────────────────────────────────┘76⚓ shipworthy 14:32:05 pre-tool-use › Scanning: service.ts77⚓ shipworthy 14:32:05 pre-tool-use › All checks passed ✓78```7980> ⚓ **shipworthy** › skill: `api-design-standards` + `security-first-development` — designing secure endpoint8182Toggle off with `SHIPWORTHY_TRANSPARENCY=0` or `"transparency": false` in `.shipworthy/config.json`.8384### 3. Architecture as Memory85The architecture spec is Claude's long-term memory for your project. Mandatory rules, directory conventions, naming patterns, tech choices -- all persisted and enforced. Session 5 knows everything session 1 decided. No more "Claude forgot we use Prisma" or "it put the route in the wrong directory again."8687### 4. Cross-Session Memory88Inspired by production agent memory architectures, Shipworthy manages a `.shipworthy/` directory as persistent project memory:8990- **INDEX.md** -- auto-generated index of all project memory, refreshed every session. Survives context compaction so Claude can rediscover what the project knows mid-conversation.91- **Learnings with frontmatter** -- retrospective findings are saved with `description` and `last_updated` fields. The description feeds into INDEX.md for one-line scanning without reading full files.92- **Dedup guard** -- before writing a new learning, the retrospective checks existing files. Same topic = update, not duplicate.93- **Memory consolidation** -- when learnings exceed 5 files or sessions exceed 10, `/retro` offers to merge duplicates, prune stale entries, fix relative dates, and remove facts contradicted by current code.94- **Session pruning** -- keeps the 10 most recent session summaries, deletes older ones. Valuable patterns from old sessions should already be captured in learnings via retrospectives.95- **Absolute dates everywhere** -- all timestamps use `YYYY-MM-DD`, never "yesterday" or "last week". Relative dates become meaningless across sessions.9697### 5. Graduated Rigor98A weekend prototype should not face the same ceremony as an enterprise platform. The plugin scales its enforcement: lightweight checks for small projects, full quality gates as your codebase grows. You start fast and the guardrails tighten as complexity demands it.99100## User Experience Tiers101102| Tier | Who | Experience |103|------|-----|-----------|104| **Builder** | Non-technical, prototyping | Guardrails are silent. Tests happen invisibly. Plain language feedback when something needs attention. |105| **Maker** | Some experience, growing project | Moderate ceremony. Explains why tests matter. Offers choices on architecture decisions. |106| **Engineer** | Production codebase, CI/CD | Full TDD, quality gates, architecture enforcement. Every PR is verified before completion. |107108## Skills (55)109110### Core (3)111| Skill | What It Does |112|-------|-------------|113| **using-shipworthy** | Master router -- loaded every session, dispatches to relevant skills |114| **architecture-awareness** | Auto-detects project type, generates and enforces architecture spec |115| **intent-to-spec** | Converts vague requests into detailed specs (invisible for Builder, shown for Engineer) |116117### Planning (5)118| Skill | What It Does |119|-------|-------------|120| **brainstorming** | 5-step design discovery with HARD-GATE approval before proceeding |121| **writing-plans** | Breaks work into bite-sized TDD implementation plans with HARD-GATE |122| **executing-plans** | Systematic task execution with verification at each step |123| **design-documents** | Creates Architecture Decision Records (ADRs) |124| **decision-frameworks** | Structured decision-making for trade-offs |125126### Quality (5)127| Skill | What It Does |128|-------|-------------|129| **test-driven-development** | RED-GREEN-REFACTOR discipline for every feature |130| **quality-gates** | Graduated pre-commit checks that scale with project size |131| **verification-before-completion** | Requires evidence (passing tests, clean build) before marking work done |132| **error-handling-patterns** | Structured errors, recovery strategies, and user-facing messages |133| **code-complexity** | Identifies and refactors complex code |134135### Security (11)136| Skill | What It Does |137|-------|-------------|138| **security-first-development** | OWASP-aware coding -- input validation, auth, secrets management |139| **adaptive-security** | Auto-detects app type (web/API/GraphQL/mobile/CLI/IoT/desktop/IaC/container) and applies type-specific security profiles |140| **secrets-management** | Comprehensive lifecycle: rotation, vault integration, leak detection |141| **dependency-management** | Vet, audit, and pin packages before adding them |142| **supply-chain-security** | Lock file integrity, typosquatting detection, SBOM, license compliance |143| **pii-detection** | Identifies and protects personally identifiable data |144| **threat-modeling** | Structured threat analysis |145| **compliance-awareness** | HIPAA, PCI-DSS, SOC2, GDPR guidance |146| **container-security** | Docker/container-specific hardening |147148### Architecture (9)149| Skill | What It Does |150|-------|-------------|151| **api-design-standards** | REST conventions, type-safe contracts, consistent error responses |152| **database-design** | Schemas, migrations, indexing, N+1 prevention |153| **performance-budgets** | Bundle size limits, response time targets, query count caps |154| **observability-by-default** | Structured logging, tracing, health checks from day one |155| **resilience-patterns** | Circuit breakers, bulkheads, retries, timeouts, graceful degradation |156| **twelve-factor-app** | Stateless design, env config, backing services |157| **distributed-systems** | Multi-service coordination, eventual consistency |158| **api-versioning** | Breaking change management |159| **api-backward-compatibility** | Non-breaking API evolution |160161### Collaboration (4)162| Skill | What It Does |163|-------|-------------|164| **subagent-driven-development** | Dispatch specialized agents with 2-stage review |165| **dispatching-parallel-agents** | Run independent tasks concurrently for speed |166| **requesting-code-review** | Structured review via the code-reviewer agent |167| **receiving-code-review** | Technical verification over performative agreement |168169### Operations (12)170| Skill | What It Does |171|-------|-------------|172| **using-git-worktrees** | Isolated workspaces for parallel development branches |173| **finishing-a-development-branch** | 5-step completion workflow: tests, cleanup, docs, PR, verify |174| **ci-cd-awareness** | Pipeline design, rollback strategies, feature flags |175| **tech-debt-tracking** | Document shortcuts so they get fixed, not forgotten |176| **session-memory** | Cross-session persistence via `.shipworthy/` with INDEX.md, pruning, and consolidation |177| **production-readiness** | Pre-deployment checklist |178| **migration-strategies** | Database migration safety |179| **zero-downtime-migrations** | Gradual migration patterns |180| **environment-setup** | Local, staging, production configuration |181| **feature-flag-discipline** | Gradual rollout, kill switches |182| **incident-response** | Outage response procedures |183| **slo-sli-definition** | Service level objectives and indicators |184185### Frontend (2)186| Skill | What It Does |187|-------|-------------|188| **accessibility** | WCAG 2.1 AA baseline for every UI component |189| **frontend-standards** | Component patterns, state management, rendering best practices |190191### Documentation (1)192| Skill | What It Does |193|-------|-------------|194| **documentation-as-code** | JSDoc, README sync, ADRs, changelog -- documentation that stays current |195196### Debugging (1)197| Skill | What It Does |198|-------|-------------|199| **systematic-debugging** | 4-phase root cause investigation: reproduce, isolate, fix, verify |200201### Meta (2)202| Skill | What It Does |203|-------|-------------|204| **writing-skills** | TDD for documentation -- create new skills using the RED-GREEN-REFACTOR process |205| **retrospective** | Self-improving loop -- extracts signals from each session, saves learnings, consolidates memory |206207## Graduated Quality Gates208209| Level | Threshold | What Gets Checked |210|-------|-----------|-------------------|211| **0** | Any project | Build runs, no obvious errors (Builder-friendly) |212| **1** | Always | Tests pass, build clean, no hardcoded secrets |213| **2** | 10+ files | Coverage > 70%, no untracked TODOs, lint clean |214| **3** | 50+ files | Bundle budgets enforced, no circular imports, API contracts validated |215| **4** | 100+ files | Performance benchmarks, accessibility audit, security scan, dependency audit |216217## Architecture Templates (8)218219Pre-built architecture specs for common stacks. The plugin selects the right one automatically, or you can run `/scaffold` to choose.220221| Template | Stack |222|----------|-------|223| `nextjs.md` | Next.js (App Router, Server Components, API Routes) |224| `express.md` | Express.js (REST API, middleware patterns) |225| `fastapi.md` | FastAPI (Python async API, Pydantic models) |226| `go-service.md` | Go (standard library HTTP, clean architecture) |227| `react-spa.md` | React SPA (client-side routing, state management) |228| `generic-typescript.md` | TypeScript (general-purpose, library or CLI) |229| `generic-python.md` | Python (general-purpose, scripts or packages) |230| `monorepo.md` | Monorepo (multi-package, shared dependencies) |231232## Agents (6)233234Specialized AI personas dispatched by skills for focused review:235236| Agent | Role |237|-------|------|238| **code-reviewer** | Line-by-line review for correctness, style, and maintainability |239| **architecture-analyzer** | Validates structural decisions against the architecture spec |240| **security-auditor** | Scans for vulnerabilities, secrets, auth gaps, injection risks |241| **test-strategist** | Evaluates test coverage, suggests missing test cases, reviews test quality |242| **project-doctor** | Infrastructure gap analysis with auto-fix recommendations |243| **pre-push-validator** | Runs 7-check validation suite (hooks, frontmatter, CSO, routing, cross-refs, quality, structure) |244245## Commands246247| Command | What It Does |248|---------|-------------|249| `/scaffold` | Generate or regenerate the architecture specification for your project |250| `/audit` | Run a full quality audit across all dimensions (tests, security, architecture, performance) |251| `/health` | Quick project health dashboard -- see where you stand at a glance |252| `/diagnose` | Infrastructure gap analysis with auto-fix options via project-doctor agent |253| `/retro` | Run a retrospective -- extract signals, save learnings, consolidate memory |254| `/validate` | Pre-push validation gate -- runs the full 7-check suite before pushing |255256## Before and After257258**Without Shipworthy:**259- Session 1: Build auth. Works great.260- Session 2: Build payments. Breaks auth. Claude forgot the auth middleware pattern.261- Session 3: Fix auth. Break payments. No tests to catch the regression.262- Ship: Security vulnerabilities, no tests, hardcoded secrets, inconsistent API responses.263264**With Shipworthy:**265- Session 1: Build auth. Architecture spec generated. Tests written automatically. Auth patterns documented.266- Session 2: Build payments. Architecture rules prevent breaking auth. Security skill catches missing input validation.267- Session 3: Add features. Quality gates catch issues before you see them. Tech debt is tracked, not hidden.268- Ship: Tested, secure, documented, production-ready.269270## Benchmark Results271272We tested the plugin with an unbiased benchmark: same prompt, same starter project, scored by 15 automated checks. The only variable is whether the plugin is loaded.273274**Task 01 — Build a REST API with CRUD (Express + TypeScript):**275276| | With Plugin | Without Plugin |277|---|---|---|278| **Score** | **22/25 (A)** | **12/25 (C)** |279| Tests | 22 tests, all passing | 0 tests |280| Input validation | Zod schemas | Manual if/else |281| Error handling | 3 structured error types | 1 basic class |282| Architecture | 8 files, separated concerns | 5 files, simpler |283284**+83% score improvement.** The plugin's TDD skill drove test creation, the security skill enforced Zod validation, and the API design skill produced proper status codes and error formatting.285286Full methodology, all 10 task definitions, and reproducible benchmark scripts: [BENCHMARKS.md](BENCHMARKS.md)287288```bash289# Run benchmarks yourself290cd benchmarks && ./run-benchmark.sh --task 1 --both291```292293## Contributing294295We welcome contributions. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on writing new skills, adding templates, proposing agents, and submitting pull requests.296297**Good first contributions:** add a new architecture template, improve a skill's edge case coverage, or add code examples to existing skills.298299---300301If this plugin helps you ship production-quality code, consider giving it a star.302303## License304305[MIT](LICENSE)306
Full transparency — inspect the skill content before installing.