How do I install Azure AI Voicelive Py?

Install Azure AI Voicelive Py with a single command: npx mdskills install sickn33/azure-ai-voicelive-py. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Azure AI Voicelive Py?

Azure AI Voicelive Py works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

Azure AI Voicelive Py

Name: Azure AI Voicelive Py: AI Agent Skill
Brand: sickn33
Availability: InStock
Rating: 7 (1 reviews)
Author: sickn33

Verified

Video & PodcastIntermediate

Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, a

by @sickn33 13,166Updated 2/20/2026

Add this skill

npx mdskills install sickn33/azure-ai-voicelive-py

Fork & Edit

Are you @sickn33? Sign in with GitHub to claim this listing.

Skill Advisor7.0

Comprehensive SDK reference with excellent examples and clear architecture but overly permissive

+Provides extensive code examples covering authentication, streaming, and event handling
+Clearly documents core resources, models, and common patterns like interrupts
+Includes voice options, audio formats, and error handling with specific examples
-Requests filesystem write and shell execution permissions not justified by SDK usage
-Lacks explicit trigger conditions for when agents should invoke this skill

SKILL.md

Edit in Browser

1---
2name: azure-ai-voicelive-py
3description: Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, and transcription.
4package: azure-ai-voicelive
5---
6 
7# Azure AI Voice Live SDK
8 
9Build real-time voice AI applications with bidirectional WebSocket communication.
10 
11## Installation
12 
13```bash
14pip install azure-ai-voicelive aiohttp azure-identity
15```
16 
17## Environment Variables
18 
19```bash
20AZURE_COGNITIVE_SERVICES_ENDPOINT=https://<region>.api.cognitive.microsoft.com
21# For API key auth (not recommended for production)
22AZURE_COGNITIVE_SERVICES_KEY=<api-key>
23```
24 
25## Authentication
26 
27**DefaultAzureCredential (preferred)**:
28```python
29from azure.ai.voicelive.aio import connect
30from azure.identity.aio import DefaultAzureCredential
31 
32async with connect(
33    endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
34    credential=DefaultAzureCredential(),
35    model="gpt-4o-realtime-preview",
36    credential_scopes=["https://cognitiveservices.azure.com/.default"]
37) as conn:
38    ...
39```
40 
41**API Key**:
42```python
43from azure.ai.voicelive.aio import connect
44from azure.core.credentials import AzureKeyCredential
45 
46async with connect(
47    endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
48    credential=AzureKeyCredential(os.environ["AZURE_COGNITIVE_SERVICES_KEY"]),
49    model="gpt-4o-realtime-preview"
50) as conn:
51    ...
52```
53 
54## Quick Start
55 
56```python
57import asyncio
58import os
59from azure.ai.voicelive.aio import connect
60from azure.identity.aio import DefaultAzureCredential
61 
62async def main():
63    async with connect(
64        endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
65        credential=DefaultAzureCredential(),
66        model="gpt-4o-realtime-preview",
67        credential_scopes=["https://cognitiveservices.azure.com/.default"]
68    ) as conn:
69        # Update session with instructions
70        await conn.session.update(session={
71            "instructions": "You are a helpful assistant.",
72            "modalities": ["text", "audio"],
73            "voice": "alloy"
74        })
75        
76        # Listen for events
77        async for event in conn:
78            print(f"Event: {event.type}")
79            if event.type == "response.audio_transcript.done":
80                print(f"Transcript: {event.transcript}")
81            elif event.type == "response.done":
82                break
83 
84asyncio.run(main())
85```
86 
87## Core Architecture
88 
89### Connection Resources
90 
91The `VoiceLiveConnection` exposes these resources:
92 
93| Resource | Purpose | Key Methods |
94|----------|---------|-------------|
95| `conn.session` | Session configuration | `update(session=...)` |
96| `conn.response` | Model responses | `create()`, `cancel()` |
97| `conn.input_audio_buffer` | Audio input | `append()`, `commit()`, `clear()` |
98| `conn.output_audio_buffer` | Audio output | `clear()` |
99| `conn.conversation` | Conversation state | `item.create()`, `item.delete()`, `item.truncate()` |
100| `conn.transcription_session` | Transcription config | `update(session=...)` |
101 
102## Session Configuration
103 
104```python
105from azure.ai.voicelive.models import RequestSession, FunctionTool
106 
107await conn.session.update(session=RequestSession(
108    instructions="You are a helpful voice assistant.",
109    modalities=["text", "audio"],
110    voice="alloy",  # or "echo", "shimmer", "sage", etc.
111    input_audio_format="pcm16",
112    output_audio_format="pcm16",
113    turn_detection={
114        "type": "server_vad",
115        "threshold": 0.5,
116        "prefix_padding_ms": 300,
117        "silence_duration_ms": 500
118    },
119    tools=[
120        FunctionTool(
121            type="function",
122            name="get_weather",
123            description="Get current weather",
124            parameters={
125                "type": "object",
126                "properties": {
127                    "location": {"type": "string"}
128                },
129                "required": ["location"]
130            }
131        )
132    ]
133))
134```
135 
136## Audio Streaming
137 
138### Send Audio (Base64 PCM16)
139 
140```python
141import base64
142 
143# Read audio chunk (16-bit PCM, 24kHz mono)
144audio_chunk = await read_audio_from_microphone()
145b64_audio = base64.b64encode(audio_chunk).decode()
146 
147await conn.input_audio_buffer.append(audio=b64_audio)
148```
149 
150### Receive Audio
151 
152```python
153async for event in conn:
154    if event.type == "response.audio.delta":
155        audio_bytes = base64.b64decode(event.delta)
156        await play_audio(audio_bytes)
157    elif event.type == "response.audio.done":
158        print("Audio complete")
159```
160 
161## Event Handling
162 
163```python
164async for event in conn:
165    match event.type:
166        # Session events
167        case "session.created":
168            print(f"Session: {event.session}")
169        case "session.updated":
170            print("Session updated")
171        
172        # Audio input events
173        case "input_audio_buffer.speech_started":
174            print(f"Speech started at {event.audio_start_ms}ms")
175        case "input_audio_buffer.speech_stopped":
176            print(f"Speech stopped at {event.audio_end_ms}ms")
177        
178        # Transcription events
179        case "conversation.item.input_audio_transcription.completed":
180            print(f"User said: {event.transcript}")
181        case "conversation.item.input_audio_transcription.delta":
182            print(f"Partial: {event.delta}")
183        
184        # Response events
185        case "response.created":
186            print(f"Response started: {event.response.id}")
187        case "response.audio_transcript.delta":
188            print(event.delta, end="", flush=True)
189        case "response.audio.delta":
190            audio = base64.b64decode(event.delta)
191        case "response.done":
192            print(f"Response complete: {event.response.status}")
193        
194        # Function calls
195        case "response.function_call_arguments.done":
196            result = handle_function(event.name, event.arguments)
197            await conn.conversation.item.create(item={
198                "type": "function_call_output",
199                "call_id": event.call_id,
200                "output": json.dumps(result)
201            })
202            await conn.response.create()
203        
204        # Errors
205        case "error":
206            print(f"Error: {event.error.message}")
207```
208 
209## Common Patterns
210 
211### Manual Turn Mode (No VAD)
212 
213```python
214await conn.session.update(session={"turn_detection": None})
215 
216# Manually control turns
217await conn.input_audio_buffer.append(audio=b64_audio)
218await conn.input_audio_buffer.commit()  # End of user turn
219await conn.response.create()  # Trigger response
220```
221 
222### Interrupt Handling
223 
224```python
225async for event in conn:
226    if event.type == "input_audio_buffer.speech_started":
227        # User interrupted - cancel current response
228        await conn.response.cancel()
229        await conn.output_audio_buffer.clear()
230```
231 
232### Conversation History
233 
234```python
235# Add system message
236await conn.conversation.item.create(item={
237    "type": "message",
238    "role": "system",
239    "content": [{"type": "input_text", "text": "Be concise."}]
240})
241 
242# Add user message
243await conn.conversation.item.create(item={
244    "type": "message",
245    "role": "user", 
246    "content": [{"type": "input_text", "text": "Hello!"}]
247})
248 
249await conn.response.create()
250```
251 
252## Voice Options
253 
254| Voice | Description |
255|-------|-------------|
256| `alloy` | Neutral, balanced |
257| `echo` | Warm, conversational |
258| `shimmer` | Clear, professional |
259| `sage` | Calm, authoritative |
260| `coral` | Friendly, upbeat |
261| `ash` | Deep, measured |
262| `ballad` | Expressive |
263| `verse` | Storytelling |
264 
265Azure voices: Use `AzureStandardVoice`, `AzureCustomVoice`, or `AzurePersonalVoice` models.
266 
267## Audio Formats
268 
269| Format | Sample Rate | Use Case |
270|--------|-------------|----------|
271| `pcm16` | 24kHz | Default, high quality |
272| `pcm16-8000hz` | 8kHz | Telephony |
273| `pcm16-16000hz` | 16kHz | Voice assistants |
274| `g711_ulaw` | 8kHz | Telephony (US) |
275| `g711_alaw` | 8kHz | Telephony (EU) |
276 
277## Turn Detection Options
278 
279```python
280# Server VAD (default)
281{"type": "server_vad", "threshold": 0.5, "silence_duration_ms": 500}
282 
283# Azure Semantic VAD (smarter detection)
284{"type": "azure_semantic_vad"}
285{"type": "azure_semantic_vad_en"}  # English optimized
286{"type": "azure_semantic_vad_multilingual"}
287```
288 
289## Error Handling
290 
291```python
292from azure.ai.voicelive.aio import ConnectionError, ConnectionClosed
293 
294try:
295    async with connect(...) as conn:
296        async for event in conn:
297            if event.type == "error":
298                print(f"API Error: {event.error.code} - {event.error.message}")
299except ConnectionClosed as e:
300    print(f"Connection closed: {e.code} - {e.reason}")
301except ConnectionError as e:
302    print(f"Connection error: {e}")
303```
304 
305## References
306 
307- **Detailed API Reference**: See [references/api-reference.md](references/api-reference.md)
308- **Complete Examples**: See [references/examples.md](references/examples.md)
309- **All Models & Types**: See [references/models.md](references/models.md)
310

Full transparency — inspect the skill content before installing.

New to skill.md files?

See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.

Read the guide →