How do I install Podcast Generation?

Install Podcast Generation with a single command: npx mdskills install microsoft/podcast-generation. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Podcast Generation?

Podcast Generation works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

Podcast Generation

Name: Podcast Generation: AI Agent Skill
Rating: 7 (1 reviews)
Author: microsoft

Verified

API & BackendIntermediate

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.

by @microsoft0Updated 2/20/2026

Add this skill

npx mdskills install microsoft/podcast-generation

Fork & Edit

Skill Advisor7.0

Provides clear WebSocket integration code with Azure OpenAI Realtime API for audio generation

+Delivers complete code examples with WebSocket connection and event handling
+Documents audio format conversion requirements with technical specifications
+Includes voice options table and frontend playback implementation
-Lacks trigger conditions for when agents should invoke this skill
-References external files (architecture.md, code-examples.md) not shown in content

SKILL.md

Edit in Browser

1---
2name: podcast-generation
3description: Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.
4---
5 
6# Podcast Generation with GPT Realtime Mini
7 
8Generate real audio narratives from text content using Azure OpenAI's Realtime API.
9 
10## Quick Start
11 
121. Configure environment variables for Realtime API
132. Connect via WebSocket to Azure OpenAI Realtime endpoint
143. Send text prompt, collect PCM audio chunks + transcript
154. Convert PCM to WAV format
165. Return base64-encoded audio to frontend for playback
17 
18## Environment Configuration
19 
20```env
21AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
22AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
23AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
24```
25 
26**Note**: Endpoint should NOT include `/openai/v1/` - just the base URL.
27 
28## Core Workflow
29 
30### Backend Audio Generation
31 
32```python
33from openai import AsyncOpenAI
34import base64
35 
36# Convert HTTPS endpoint to WebSocket URL
37ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"
38 
39client = AsyncOpenAI(
40    websocket_base_url=ws_url,
41    api_key=api_key
42)
43 
44audio_chunks = []
45transcript_parts = []
46 
47async with client.realtime.connect(model="gpt-realtime-mini") as conn:
48    # Configure for audio-only output
49    await conn.session.update(session={
50        "output_modalities": ["audio"],
51        "instructions": "You are a narrator. Speak naturally."
52    })
53    
54    # Send text to narrate
55    await conn.conversation.item.create(item={
56        "type": "message",
57        "role": "user",
58        "content": [{"type": "input_text", "text": prompt}]
59    })
60    
61    await conn.response.create()
62    
63    # Collect streaming events
64    async for event in conn:
65        if event.type == "response.output_audio.delta":
66            audio_chunks.append(base64.b64decode(event.delta))
67        elif event.type == "response.output_audio_transcript.delta":
68            transcript_parts.append(event.delta)
69        elif event.type == "response.done":
70            break
71 
72# Convert PCM to WAV (see scripts/pcm_to_wav.py)
73pcm_audio = b''.join(audio_chunks)
74wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
75```
76 
77### Frontend Audio Playback
78 
79```javascript
80// Convert base64 WAV to playable blob
81const base64ToBlob = (base64, mimeType) => {
82  const bytes = atob(base64);
83  const arr = new Uint8Array(bytes.length);
84  for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
85  return new Blob([arr], { type: mimeType });
86};
87 
88const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
89const audioUrl = URL.createObjectURL(audioBlob);
90new Audio(audioUrl).play();
91```
92 
93## Voice Options
94 
95| Voice | Character |
96|-------|-----------|
97| alloy | Neutral |
98| echo | Warm |
99| fable | Expressive |
100| onyx | Deep |
101| nova | Friendly |
102| shimmer | Clear |
103 
104## Realtime API Events
105 
106- `response.output_audio.delta` - Base64 audio chunk
107- `response.output_audio_transcript.delta` - Transcript text
108- `response.done` - Generation complete
109- `error` - Handle with `event.error.message`
110 
111## Audio Format
112 
113- **Input**: Text prompt
114- **Output**: PCM audio (24kHz, 16-bit, mono)
115- **Storage**: Base64-encoded WAV
116 
117## References
118 
119- **Full architecture**: See [references/architecture.md](references/architecture.md) for complete stack design
120- **Code examples**: See [references/code-examples.md](references/code-examples.md) for production patterns
121- **PCM conversion**: Use [scripts/pcm_to_wav.py](scripts/pcm_to_wav.py) for audio format conversion
122

Full transparency — inspect the skill content before installing.