Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.
Add this skill
npx mdskills install microsoft/podcast-generationProvides clear WebSocket integration code with Azure OpenAI Realtime API for audio generation
1---2name: podcast-generation3description: Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.4---56# Podcast Generation with GPT Realtime Mini78Generate real audio narratives from text content using Azure OpenAI's Realtime API.910## Quick Start11121. Configure environment variables for Realtime API132. Connect via WebSocket to Azure OpenAI Realtime endpoint143. Send text prompt, collect PCM audio chunks + transcript154. Convert PCM to WAV format165. Return base64-encoded audio to frontend for playback1718## Environment Configuration1920```env21AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key22AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com23AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini24```2526**Note**: Endpoint should NOT include `/openai/v1/` - just the base URL.2728## Core Workflow2930### Backend Audio Generation3132```python33from openai import AsyncOpenAI34import base643536# Convert HTTPS endpoint to WebSocket URL37ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"3839client = AsyncOpenAI(40 websocket_base_url=ws_url,41 api_key=api_key42)4344audio_chunks = []45transcript_parts = []4647async with client.realtime.connect(model="gpt-realtime-mini") as conn:48 # Configure for audio-only output49 await conn.session.update(session={50 "output_modalities": ["audio"],51 "instructions": "You are a narrator. Speak naturally."52 })5354 # Send text to narrate55 await conn.conversation.item.create(item={56 "type": "message",57 "role": "user",58 "content": [{"type": "input_text", "text": prompt}]59 })6061 await conn.response.create()6263 # Collect streaming events64 async for event in conn:65 if event.type == "response.output_audio.delta":66 audio_chunks.append(base64.b64decode(event.delta))67 elif event.type == "response.output_audio_transcript.delta":68 transcript_parts.append(event.delta)69 elif event.type == "response.done":70 break7172# Convert PCM to WAV (see scripts/pcm_to_wav.py)73pcm_audio = b''.join(audio_chunks)74wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)75```7677### Frontend Audio Playback7879```javascript80// Convert base64 WAV to playable blob81const base64ToBlob = (base64, mimeType) => {82 const bytes = atob(base64);83 const arr = new Uint8Array(bytes.length);84 for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);85 return new Blob([arr], { type: mimeType });86};8788const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');89const audioUrl = URL.createObjectURL(audioBlob);90new Audio(audioUrl).play();91```9293## Voice Options9495| Voice | Character |96|-------|-----------|97| alloy | Neutral |98| echo | Warm |99| fable | Expressive |100| onyx | Deep |101| nova | Friendly |102| shimmer | Clear |103104## Realtime API Events105106- `response.output_audio.delta` - Base64 audio chunk107- `response.output_audio_transcript.delta` - Transcript text108- `response.done` - Generation complete109- `error` - Handle with `event.error.message`110111## Audio Format112113- **Input**: Text prompt114- **Output**: PCM audio (24kHz, 16-bit, mono)115- **Storage**: Base64-encoded WAV116117## References118119- **Full architecture**: See [references/architecture.md](references/architecture.md) for complete stack design120- **Code examples**: See [references/code-examples.md](references/code-examples.md) for production patterns121- **PCM conversion**: Use [scripts/pcm_to_wav.py](scripts/pcm_to_wav.py) for audio format conversion122
Full transparency โ inspect the skill content before installing.