Generate styled video compositions from your local video footage library using natural language. ShotAI handles shot-level indexing and semantic search; this tool handles the planning, music, and rendering. This repo ships a ready-to-install Claude Agent Skill in the skill/ directory. Install in Claude Code: Or point Claude Code settings to the local skill/ folder. Once installed, just describe wh
Add this skill
npx mdskills install abu-ShotAI/ai-video-remixWell-documented video generation pipeline with semantic search, LLM planning, and multiple composition styles
AI-driven video remix generator — semantic video search + LLM planning + Remotion rendering.
Requires ShotAI — local video asset management and footage search for Mac.
中文文档: README.zh.md
Generate styled video compositions from your local video footage library using natural language. ShotAI handles shot-level indexing and semantic search; this tool handles the planning, music, and rendering.
Hong Kong Cyberpunk Night — generated from local video footage with ShotAI + Remotion, no manual editing.
This repo ships a ready-to-install Claude Agent Skill in the skill/ directory.
Install in Claude Code:
/plugin install ai-video-remix@abu-ShotAI/ai-video-remix#skill
Or point Claude Code settings to the local skill/ folder.
Once installed, just describe what you want:
"Make a travel vlog from my library" "Create a cyberpunk city highlight reel" "Sports highlight from last weekend's footage"
| Tool | Purpose | Install |
|---|---|---|
| ShotAI | AI video asset management + semantic footage search — provides the MCP server this tool queries | Download for Mac |
| ffmpeg | Clip extraction and keyframe analysis | brew install ffmpeg |
| yt-dlp | Auto background music from YouTube | brew install yt-dlp |
| Node.js 18+ | Runtime | brew install node |
http://127.0.0.1:23817) and MCP Tokengit clone https://github.com/abu-ShotAI/ai-video-remix.git
cd ai-video-editor
npm install
cp .env.example .env # fill in SHOTAI_URL, SHOTAI_TOKEN, and optionally AGENT_PROVIDER
# Nature documentary (no LLM required)
AGENT_PROVIDER=none npx tsx src/skill/cli.ts "自然野生动物纪录片" --composition NatureWild
# Sports highlight reel (generic — works with any sport footage)
npx tsx src/skill/cli.ts "世界杯足球精彩时刻混剪" --composition SportsHighlight
# Travel vlog with English captions
npx tsx src/skill/cli.ts "Japan and Paris travel highlights" --composition TravelVlog --lang en
# Cyberpunk city night cuts
npx tsx src/skill/cli.ts "香港赛博朋克夜景混剪" --composition CyberpunkCity
# With local music file
npx tsx src/skill/cli.ts "scenic alpine journey" --composition SwitzerlandScenic --bgm ./music/alpine.mp3
How AI Video Remix turns a text prompt into a finished video:
User prompt
│
▼
1. parseIntent — LLM extracts theme, selects composition, optionally overrides music style
2. refineQueries — LLM rewrites per-slot search terms to match library content
3. pickShots — ShotAI semantic search across your video footage library; scored by similarity + duration + mood
4. resolveMusic — yt-dlp YouTube search+download, or local --bgm file
5. extractClip — ffmpeg trims each shot to an independent .mp4
6. annotateClips — LLM assigns per-clip visual params (tone, kenBurns, dramatic, caption)
7. File Server — HTTP server serves clips to the Remotion renderer
8. Remotion render — Final MP4 composed and rendered
npx tsx src/skill/cli.ts "" [options]
Options:
--composition Force a specific composition (skip LLM selection)
--bgm Local MP3 path (skip YouTube search)
--lang Caption language — zh (default) or en
--output Output directory (default: ./output)
--probe Scan library first; LLM plans slots from actual content
| ID | Style | Best For |
|---|---|---|
CyberpunkCity | Cyberpunk night | Neon city, night scenes, sci-fi |
TravelVlog | Travel vlog | Multi-city travel with location cards |
MoodDriven | Mood-driven cuts | Emotional fast/slow montage |
NatureWild | BBC nature doc | Wildlife, landscapes, nature footage |
SwitzerlandScenic | Alpine scenic | Mountain travel with elegant captions |
SportsHighlight | ESPN sports | Goal/action highlights with captions |
Standard mode (default) — LLM picks the composition and generates search queries from registry templates.
Probe mode (--probe) — Scans the library first (video names, shot samples, mood/scene tags), then LLM builds custom slots tailored to what actually exists. Use this when:
Edit .env (copy from .env.example):
# ── LLM Agent ────────────────────────────────────────────────────────────────
AGENT_PROVIDER=claude # claude | openai | openai-compat | none
ANTHROPIC_API_KEY=sk-ant-... # required when AGENT_PROVIDER=claude
OPENAI_API_KEY=sk-... # required when AGENT_PROVIDER=openai
OPENAI_COMPAT_BASE_URL=https://... # required when AGENT_PROVIDER=openai-compat
OPENAI_COMPAT_API_KEY=sk-...
AGENT_MODEL=claude-sonnet-4-6 # override default model
# ── ShotAI ───────────────────────────────────────────────────────────────────
SHOTAI_URL=http://127.0.0.1:23817
SHOTAI_TOKEN=
# ── Music ────────────────────────────────────────────────────────────────────
BGM_PATH=/path/to/music.mp3 # permanent local BGM default
# ── Quality ──────────────────────────────────────────────────────────────────
MIN_SCORE=0.5 # shot quality threshold 0–1 (recommended: 0.5)
Claude (Anthropic)
AGENT_PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...
AGENT_MODEL=claude-sonnet-4-6
OpenAI
AGENT_PROVIDER=openai
OPENAI_API_KEY=sk-...
AGENT_MODEL=gpt-4o
OpenRouter (recommended for multi-provider access)
AGENT_PROVIDER=openai-compat
OPENAI_COMPAT_BASE_URL=https://openrouter.ai/api/v1
OPENAI_COMPAT_API_KEY=sk-or-v1-...
AGENT_MODEL=deepseek/deepseek-chat-v3-0324
Ollama (local, no API key needed)
AGENT_PROVIDER=openai-compat
OPENAI_COMPAT_BASE_URL=http://localhost:11434/v1
OPENAI_COMPAT_API_KEY=ollama
AGENT_MODEL=llama3.1
DeepSeek (direct)
AGENT_PROVIDER=openai-compat
OPENAI_COMPAT_BASE_URL=https://api.deepseek.com/v1
OPENAI_COMPAT_API_KEY=sk-...
AGENT_MODEL=deepseek-chat
No LLM (heuristic fallback)
AGENT_PROVIDER=none
Keyword-based composition selection + registry default queries. No API key required.
An 80ms head/tail trim is applied automatically (TRIM = 0.08). If it persists, increase TRIM to 0.12 or 0.15 in src/skill/orchestrator.ts.
GlitchFlicker triggers on very short clips. Set MIN_SCORE=0.5 in .env to keep short clips out of the pipeline.
MIN_SCORE (try 0.5 → 0.7)--probe mode — LLM sees your actual library before picking queries--composition to a composition whose slots match your contentpip install -U yt-dlp # update yt-dlp (requires 2026.03.03+)
# or use a local file:
npx tsx src/skill/cli.ts "..." --bgm /path/to/music.mp3
If yt-dlp reports n challenge solving failed, it needs the EJS remote solver. This is handled automatically with --remote-components ejs:github (already set in the code).
| Step | Typical time (M-series Mac) |
|---|---|
| Remotion render (60s video) | 30–90s |
| ShotAI search per slot | 1–3s |
| ffmpeg clip extraction | ~0.5s per clip |
See references/composition-guide.md for step-by-step instructions on adding a new Remotion visual style + registry entry.
Install via CLI
npx mdskills install abu-ShotAI/ai-video-remixAI Video Remix is a free, open-source AI agent skill. Generate styled video compositions from your local video footage library using natural language. ShotAI handles shot-level indexing and semantic search; this tool handles the planning, music, and rendering. This repo ships a ready-to-install Claude Agent Skill in the skill/ directory. Install in Claude Code: Or point Claude Code settings to the local skill/ folder. Once installed, just describe wh
Install AI Video Remix with a single command:
npx mdskills install abu-ShotAI/ai-video-remixThis downloads the skill files into your project and your AI agent picks them up automatically.
AI Video Remix works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Gemini Cli, Amp, Roo Code, Goose. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.