What is Video Podcast Maker?

Video Podcast Maker is a free, open-source AI agent skill. Use when user provides a topic and wants an automated video podcast created, OR when user wants to learn/analyze video design patterns from reference videos — handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music. Also supports design learning from reference videos (learn command), style profile management, and design reference library.

How do I install Video Podcast Maker?

Install Video Podcast Maker with a single command: npx mdskills install Agents365-ai/video-podcast-maker. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Video Podcast Maker?

Video Podcast Maker works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

Video Podcast Maker

Name: Video Podcast Maker: AI Agent Skill
Brand: Agents365-ai
Availability: InStock
Rating: 8.3 (1 reviews)
Author: Agents365-ai

中文文档

Automated pipeline to create professional video podcasts from a topic. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels with multi-language output (zh-CN, en-US). Combines research, script generation, multi-engine TTS (Edge/Azure/Doubao/CosyVoice), Remotion video rendering, and FFmpeg audio mixing.

Works with: Claude Code · OpenClaw (ClawHub) · OpenCode · Codex — any coding agent that supports SKILL.md

Publish to: Bilibili · YouTube · Xiaohongshu · Douyin · WeChat Channels

No coding required! Just describe your topic in plain language — the coding agent guides you through each step interactively. You make creative decisions, the agent handles all the technical details. Creating your first video podcast is easier than you think.

Note: This project is still under active development and may not be fully mature yet. We are continuously iterating and improving. Your feedback and suggestions are greatly appreciated — feel free to open an issue or reach out!

Features

Topic Research - Web search and content gathering
Script Writing - Structured narration with section markers
Multi-TTS - Edge TTS (free), Azure Speech, Volcengine Doubao, CosyVoice, ElevenLabs, Google Cloud TTS, OpenAI TTS
Remotion Video - React-based video composition with animations
Visual Style Editing - Adjust colors, fonts, and layout in Remotion Studio UI
Real-time Preview - Remotion Studio for instant debugging before render
Auto Timing - Audio-video sync via timing.json
BGM Mixing - Background music overlay with FFmpeg
Subtitle Burning - Optional SRT subtitle embedding
4K Output - 3840x2160 resolution for crisp uploads
Chapter Progress Bar - Visual timeline showing current section during playback
Bilingual TTS - Chinese/English mixed narration with Azure Speech or CosyVoice
Pronunciation Correction - Global + per-project phoneme dictionaries for Chinese polyphone fixes
Bilibili Templates - Ready-to-use Remotion templates (Video.tsx, Root.tsx, Thumbnail.tsx, podcast.txt) for quick project scaffolding
Component Library - Reusable visual building blocks (ComparisonCard, Timeline, CodeBlock, QuoteBlock, FeatureGrid, DataBar, StatCounter, FlowChart, IconCard, DiagramReveal, AudioWaveform, LottieAnimation, MediaSection, SectionLayouts, AnimatedBackground) for composing rich section layouts
Preference Learning - Auto-learns user style preferences (colors, fonts, speech rate) and applies them to future videos
Multi-Platform - Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels with independent platform and language settings
Multi-Language - Chinese (zh-CN) and English (en-US) script templates, TTS voices, subtitle fonts
Subtitle Preferences - Custom font, size, color, outline; toggle subtitle burning on/off
Configurable CTA - Auto (Bilibili triple/YouTube subscribe), animation, text, or custom

Platform Optimizations

Bilibili:

Script Structure - Welcome intro + call-to-action outro (一键三连)
Chapter Timestamps - Auto-generated MM:SS format for B站 chapters
Thumbnail Generation - AI (imagen/imagenty) or Remotion, auto-generates 16:9 + 4:3 versions
Visual Style - Bold text, minimal whitespace, high information density
Publish Info - Title formulas, tag strategies, description templates

YouTube:

SEO Optimization - Title Important: This skill requires a Remotion project as the foundation.

Understanding the components:

Component	Source	Purpose
Remotion Project	`npx create-video`	Base framework with `src/`, `public/`, `package.json`
video-podcast-maker	Claude Code skill	Workflow orchestration (this skill)

# Step 1: Create a new Remotion project (base framework)
npx create-video@latest my-video-project
cd my-video-project
npm i  # Install Remotion dependencies

# Step 2: Verify installation
npx remotion studio  # Should open browser preview

If you already have a Remotion project:

cd your-existing-project
npm install remotion @remotion/cli @remotion/player zod

API Keys Required

Service	Purpose	Get Key
Azure Speech	TTS audio generation (high quality)	Azure Portal → Speech Services
Volcengine Doubao Speech	TTS audio generation (alternative backend)	Volcengine Console
Aliyun CosyVoice	TTS audio generation (alternative backend)	Aliyun Bailian
Edge TTS	TTS audio generation (default, free, no key needed)	`pip install edge-tts`
ElevenLabs	TTS audio generation (highest quality English)	ElevenLabs
Google Cloud TTS	TTS audio generation (wide language support)	Google Cloud Console
OpenAI	TTS audio generation (simple API)	OpenAI Platform
Google Gemini	AI thumbnail generation (optional)	AI Studio
Aliyun Dashscope	AI thumbnail - Chinese optimized (optional)	Aliyun Bailian

Environment Variables

Add to ~/.zshrc or ~/.bashrc:

# TTS Backend: edge (default, free), azure, doubao, cosyvoice, elevenlabs, google, openai
export TTS_BACKEND="edge"                            # Default (free), or "azure" / "doubao" / "cosyvoice" / "elevenlabs" / "google" / "openai"

# Azure TTS (high quality)
export AZURE_SPEECH_KEY="your-azure-speech-key"
export AZURE_SPEECH_REGION="eastasia"

# Volcengine Doubao TTS (alternative backend)
export VOLCENGINE_APPID="your-volcengine-appid"
export VOLCENGINE_ACCESS_TOKEN="your-volcengine-access-token"
export VOLCENGINE_CLUSTER="volcano_tts"              # Default cluster, adjust per console config
export VOLCENGINE_VOICE_TYPE="BV001_streaming"       # Adjust per console voice options

# Aliyun CosyVoice TTS (alternative backend) + AI thumbnails
export DASHSCOPE_API_KEY="your-dashscope-api-key"

# Optional: Edge TTS voice override
export EDGE_TTS_VOICE="zh-CN-XiaoxiaoNeural"

# ElevenLabs TTS
export ELEVENLABS_API_KEY="your-elevenlabs-api-key"

# Google Cloud TTS
export GOOGLE_TTS_API_KEY="your-google-tts-api-key"

# OpenAI TTS
export OPENAI_API_KEY="your-openai-api-key"

# Optional: Google Gemini for AI thumbnails
export GEMINI_API_KEY="your-gemini-api-key"

Then reload: source ~/.zshrc

Quick Start

Usage

This skill is designed for use with Claude Code or Opencode. Simply tell Claude:

"Create a video podcast about [your topic]"

Claude will guide you through the entire workflow automatically.

Tips: The quality of first-generation output heavily depends on the model's intelligence and capabilities — the smarter and more advanced the model, the better the results. In our testing, both Codex and Claude Code produce excellent videos on the first try, and OpenCode paired with GLM-5 also delivers solid results. If the initial output isn't perfect, you can preview it in Remotion Studio and ask the coding agent to keep refining until you're satisfied.

Preview & Visual Editing with Remotion Studio

Before rendering the final video, use Remotion Studio to preview and visually edit styles:

npx remotion studio src/remotion/index.ts

This opens a browser-based editor where you can:

Visual Style Editing - Adjust colors, fonts, and sizes in the right panel
Scrub through the timeline frame-by-frame
See live updates as you edit components
Debug timing and animations instantly

Editable Properties

Category	Properties
Colors	Primary color, background, text color, accent
Typography	Title size (72-120), subtitle size, body size
Progress Bar	Show/hide, height, font size, active color
Audio	BGM volume (0-0.3)
Animation	Enable/disable entrance animations

Configuration Files

File	Scope	Purpose
`phonemes.json`	Global	Chinese polyphone dictionary shared across all projects. Edit to add/fix pronunciations (e.g., 行 háng vs xíng). Per-project overrides go in `videos/{name}/phonemes.json`
`user_prefs.template.json`	Global	Default preferences template. Copied to `user_prefs.json` on first run, which auto-evolves as the skill learns your style
`prefs_schema.json`	Global	JSON Schema for preference validation. Do not edit manually
`tsconfig.json`	Global	TypeScript config for Remotion templates

Output Structure

videos/{video-name}/
├── topic_definition.md      # Topic direction
├── topic_research.md        # Research notes
├── podcast.txt              # Narration script
├── phonemes.json            # (Optional) Project-specific pronunciation overrides
├── podcast_audio.wav        # TTS audio
├── podcast_audio.srt        # Subtitles
├── timing.json              # Section timing for sync
├── thumbnail_*.png          # Video thumbnails
├── publish_info.md          # Title, tags, description
├── part_*.wav               # TTS segments (temp, cleanup via Step 14)
├── output.mp4               # Raw render (temp)
├── video_with_bgm.mp4       # With BGM (temp)
└── final_video.mp4          # Final output

Background Music

Included tracks in assets/:

perfect-beauty-191271.mp3 - Upbeat, positive
snow-stevekaldes-piano-397491.mp3 - Calm piano

Roadmap

License

MIT

Support

If this project helps you, consider supporting the author:

  ![WeChat Pay](https://raw.githubusercontent.com/Agents365-ai/images_payment/main/qrcode/wechat-pay.png)
  

  WeChat Pay


  ![Alipay](https://raw.githubusercontent.com/Agents365-ai/images_payment/main/qrcode/alipay.png)
  

  Alipay


  ![Buy Me a Coffee](https://raw.githubusercontent.com/Agents365-ai/images_payment/main/qrcode/buymeacoffee.png)
  

  Buy Me a Coffee

Author

Agents365-ai

Bilibili: https://space.bilibili.com/441831884
GitHub: https://github.com/Agents365-ai

Video Podcast Maker

Video Podcast Maker

Features

Platform Optimizations

API Keys Required

Environment Variables

Quick Start

Usage

Preview & Visual Editing with Remotion Studio

Editable Properties

Configuration Files

Output Structure

Background Music

Roadmap

License

Support

Author

Quick Start

Tags

Platforms

Frequently Asked Questions

What is Video Podcast Maker?

How do I install Video Podcast Maker?

What platforms support Video Podcast Maker?