Use when user provides a topic and wants an automated video podcast created, OR when user wants to learn/analyze video design patterns from reference videos — handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music. Also supports design learning from reference videos (learn command), style profile management, and design reference library.
Add this skill
npx mdskills install Agents365-ai/video-podcast-makerComprehensive video podcast automation pipeline with strong platform support, TTS options, and visual design learning
Automated pipeline to create professional video podcasts from a topic. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels with multi-language output (zh-CN, en-US). Combines research, script generation, multi-engine TTS (Edge/Azure/Doubao/CosyVoice), Remotion video rendering, and FFmpeg audio mixing.
Works with: Claude Code · OpenClaw (ClawHub) · OpenCode · Codex — any coding agent that supports SKILL.md
Publish to: Bilibili · YouTube · Xiaohongshu · Douyin · WeChat Channels
No coding required! Just describe your topic in plain language — the coding agent guides you through each step interactively. You make creative decisions, the agent handles all the technical details. Creating your first video podcast is easier than you think.
Note: This project is still under active development and may not be fully mature yet. We are continuously iterating and improving. Your feedback and suggestions are greatly appreciated — feel free to open an issue or reach out!
timing.jsonVideo.tsx, Root.tsx, Thumbnail.tsx, podcast.txt) for quick project scaffoldingBilibili:
MM:SS format for B站 chaptersYouTube:
Understanding the components:
| Component | Source | Purpose |
|---|---|---|
| Remotion Project | npx create-video | Base framework with src/, public/, package.json |
| video-podcast-maker | Claude Code skill | Workflow orchestration (this skill) |
# Step 1: Create a new Remotion project (base framework)
npx create-video@latest my-video-project
cd my-video-project
npm i # Install Remotion dependencies
# Step 2: Verify installation
npx remotion studio # Should open browser preview
If you already have a Remotion project:
cd your-existing-project
npm install remotion @remotion/cli @remotion/player zod
| Service | Purpose | Get Key |
|---|---|---|
| Azure Speech | TTS audio generation (high quality) | Azure Portal → Speech Services |
| Volcengine Doubao Speech | TTS audio generation (alternative backend) | Volcengine Console |
| Aliyun CosyVoice | TTS audio generation (alternative backend) | Aliyun Bailian |
| Edge TTS | TTS audio generation (default, free, no key needed) | pip install edge-tts |
| ElevenLabs | TTS audio generation (highest quality English) | ElevenLabs |
| Google Cloud TTS | TTS audio generation (wide language support) | Google Cloud Console |
| OpenAI | TTS audio generation (simple API) | OpenAI Platform |
| Google Gemini | AI thumbnail generation (optional) | AI Studio |
| Aliyun Dashscope | AI thumbnail - Chinese optimized (optional) | Aliyun Bailian |
Add to ~/.zshrc or ~/.bashrc:
# TTS Backend: edge (default, free), azure, doubao, cosyvoice, elevenlabs, google, openai
export TTS_BACKEND="edge" # Default (free), or "azure" / "doubao" / "cosyvoice" / "elevenlabs" / "google" / "openai"
# Azure TTS (high quality)
export AZURE_SPEECH_KEY="your-azure-speech-key"
export AZURE_SPEECH_REGION="eastasia"
# Volcengine Doubao TTS (alternative backend)
export VOLCENGINE_APPID="your-volcengine-appid"
export VOLCENGINE_ACCESS_TOKEN="your-volcengine-access-token"
export VOLCENGINE_CLUSTER="volcano_tts" # Default cluster, adjust per console config
export VOLCENGINE_VOICE_TYPE="BV001_streaming" # Adjust per console voice options
# Aliyun CosyVoice TTS (alternative backend) + AI thumbnails
export DASHSCOPE_API_KEY="your-dashscope-api-key"
# Optional: Edge TTS voice override
export EDGE_TTS_VOICE="zh-CN-XiaoxiaoNeural"
# ElevenLabs TTS
export ELEVENLABS_API_KEY="your-elevenlabs-api-key"
# Google Cloud TTS
export GOOGLE_TTS_API_KEY="your-google-tts-api-key"
# OpenAI TTS
export OPENAI_API_KEY="your-openai-api-key"
# Optional: Google Gemini for AI thumbnails
export GEMINI_API_KEY="your-gemini-api-key"
Then reload: source ~/.zshrc
This skill is designed for use with Claude Code or Opencode. Simply tell Claude:
"Create a video podcast about [your topic]"
Claude will guide you through the entire workflow automatically.
Tips: The quality of first-generation output heavily depends on the model's intelligence and capabilities — the smarter and more advanced the model, the better the results. In our testing, both Codex and Claude Code produce excellent videos on the first try, and OpenCode paired with GLM-5 also delivers solid results. If the initial output isn't perfect, you can preview it in Remotion Studio and ask the coding agent to keep refining until you're satisfied.
Before rendering the final video, use Remotion Studio to preview and visually edit styles:
npx remotion studio src/remotion/index.ts
This opens a browser-based editor where you can:
| Category | Properties |
|---|---|
| Colors | Primary color, background, text color, accent |
| Typography | Title size (72-120), subtitle size, body size |
| Progress Bar | Show/hide, height, font size, active color |
| Audio | BGM volume (0-0.3) |
| Animation | Enable/disable entrance animations |
| File | Scope | Purpose |
|---|---|---|
phonemes.json | Global | Chinese polyphone dictionary shared across all projects. Edit to add/fix pronunciations (e.g., 行 háng vs xíng). Per-project overrides go in videos/{name}/phonemes.json |
user_prefs.template.json | Global | Default preferences template. Copied to user_prefs.json on first run, which auto-evolves as the skill learns your style |
prefs_schema.json | Global | JSON Schema for preference validation. Do not edit manually |
tsconfig.json | Global | TypeScript config for Remotion templates |
videos/{video-name}/
├── topic_definition.md # Topic direction
├── topic_research.md # Research notes
├── podcast.txt # Narration script
├── phonemes.json # (Optional) Project-specific pronunciation overrides
├── podcast_audio.wav # TTS audio
├── podcast_audio.srt # Subtitles
├── timing.json # Section timing for sync
├── thumbnail_*.png # Video thumbnails
├── publish_info.md # Title, tags, description
├── part_*.wav # TTS segments (temp, cleanup via Step 14)
├── output.mp4 # Raw render (temp)
├── video_with_bgm.mp4 # With BGM (temp)
└── final_video.mp4 # Final output
Included tracks in assets/:
perfect-beauty-191271.mp3 - Upbeat, positivesnow-stevekaldes-piano-397491.mp3 - Calm piano--resume flag)--dry-run for duration estimation)references/ layered docs, ${CLAUDE_SKILL_DIR} variable, argument-hint/effort/allowed-tools frontmatter fields)MIT
If this project helps you, consider supporting the author:

WeChat Pay

Alipay

Buy Me a Coffee
Agents365-ai
Install via CLI
npx mdskills install Agents365-ai/video-podcast-makerVideo Podcast Maker is a free, open-source AI agent skill. Use when user provides a topic and wants an automated video podcast created, OR when user wants to learn/analyze video design patterns from reference videos — handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music. Also supports design learning from reference videos (learn command), style profile management, and design reference library.
Install Video Podcast Maker with a single command:
npx mdskills install Agents365-ai/video-podcast-makerThis downloads the skill files into your project and your AI agent picks them up automatically.
Video Podcast Maker works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.