|
Add this skill
npx mdskills install sickn33/azure-ai-voicelive-tsComprehensive SDK reference with excellent code examples, event handling patterns, and authentication setup
1---2name: azure-ai-voicelive-ts3description: |4 Azure AI Voice Live SDK for JavaScript/TypeScript. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots in Node.js or browser environments. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant TypeScript", "bidirectional audio", "speech-to-speech JavaScript".5package: "@azure/ai-voicelive"6---78# @azure/ai-voicelive (JavaScript/TypeScript)910Real-time voice AI SDK for building bidirectional voice assistants with Azure AI in Node.js and browser environments.1112## Installation1314```bash15npm install @azure/ai-voicelive @azure/identity16# TypeScript users17npm install @types/node18```1920**Current Version**: 1.0.0-beta.32122**Supported Environments**:23- Node.js LTS versions (20+)24- Modern browsers (Chrome, Firefox, Safari, Edge)2526## Environment Variables2728```bash29AZURE_VOICELIVE_ENDPOINT=https://<resource>.cognitiveservices.azure.com30# Optional: API key if not using Entra ID31AZURE_VOICELIVE_API_KEY=<your-api-key>32# Optional: Logging33AZURE_LOG_LEVEL=info34```3536## Authentication3738### Microsoft Entra ID (Recommended)3940```typescript41import { DefaultAzureCredential } from "@azure/identity";42import { VoiceLiveClient } from "@azure/ai-voicelive";4344const credential = new DefaultAzureCredential();45const endpoint = "https://your-resource.cognitiveservices.azure.com";4647const client = new VoiceLiveClient(endpoint, credential);48```4950### API Key5152```typescript53import { AzureKeyCredential } from "@azure/core-auth";54import { VoiceLiveClient } from "@azure/ai-voicelive";5556const endpoint = "https://your-resource.cognitiveservices.azure.com";57const credential = new AzureKeyCredential("your-api-key");5859const client = new VoiceLiveClient(endpoint, credential);60```6162## Client Hierarchy6364```65VoiceLiveClient66└── VoiceLiveSession (WebSocket connection)67 ├── updateSession() → Configure session options68 ├── subscribe() → Event handlers (Azure SDK pattern)69 ├── sendAudio() → Stream audio input70 ├── addConversationItem() → Add messages/function outputs71 └── sendEvent() → Send raw protocol events72```7374## Quick Start7576```typescript77import { DefaultAzureCredential } from "@azure/identity";78import { VoiceLiveClient } from "@azure/ai-voicelive";7980const credential = new DefaultAzureCredential();81const endpoint = process.env.AZURE_VOICELIVE_ENDPOINT!;8283// Create client and start session84const client = new VoiceLiveClient(endpoint, credential);85const session = await client.startSession("gpt-4o-mini-realtime-preview");8687// Configure session88await session.updateSession({89 modalities: ["text", "audio"],90 instructions: "You are a helpful AI assistant. Respond naturally.",91 voice: {92 type: "azure-standard",93 name: "en-US-AvaNeural",94 },95 turnDetection: {96 type: "server_vad",97 threshold: 0.5,98 prefixPaddingMs: 300,99 silenceDurationMs: 500,100 },101 inputAudioFormat: "pcm16",102 outputAudioFormat: "pcm16",103});104105// Subscribe to events106const subscription = session.subscribe({107 onResponseAudioDelta: async (event, context) => {108 // Handle streaming audio output109 const audioData = event.delta;110 playAudioChunk(audioData);111 },112 onResponseTextDelta: async (event, context) => {113 // Handle streaming text114 process.stdout.write(event.delta);115 },116 onInputAudioTranscriptionCompleted: async (event, context) => {117 console.log("User said:", event.transcript);118 },119});120121// Send audio from microphone122function sendAudioChunk(audioBuffer: ArrayBuffer) {123 session.sendAudio(audioBuffer);124}125```126127## Session Configuration128129```typescript130await session.updateSession({131 // Modalities132 modalities: ["audio", "text"],133134 // System instructions135 instructions: "You are a customer service representative.",136137 // Voice selection138 voice: {139 type: "azure-standard", // or "azure-custom", "openai"140 name: "en-US-AvaNeural",141 },142143 // Turn detection (VAD)144 turnDetection: {145 type: "server_vad", // or "azure_semantic_vad"146 threshold: 0.5,147 prefixPaddingMs: 300,148 silenceDurationMs: 500,149 },150151 // Audio formats152 inputAudioFormat: "pcm16",153 outputAudioFormat: "pcm16",154155 // Tools (function calling)156 tools: [157 {158 type: "function",159 name: "get_weather",160 description: "Get current weather",161 parameters: {162 type: "object",163 properties: {164 location: { type: "string" }165 },166 required: ["location"]167 }168 }169 ],170 toolChoice: "auto",171});172```173174## Event Handling (Azure SDK Pattern)175176The SDK uses a subscription-based event handling pattern:177178```typescript179const subscription = session.subscribe({180 // Connection lifecycle181 onConnected: async (args, context) => {182 console.log("Connected:", args.connectionId);183 },184 onDisconnected: async (args, context) => {185 console.log("Disconnected:", args.code, args.reason);186 },187 onError: async (args, context) => {188 console.error("Error:", args.error.message);189 },190191 // Session events192 onSessionCreated: async (event, context) => {193 console.log("Session created:", context.sessionId);194 },195 onSessionUpdated: async (event, context) => {196 console.log("Session updated");197 },198199 // Audio input events (VAD)200 onInputAudioBufferSpeechStarted: async (event, context) => {201 console.log("Speech started at:", event.audioStartMs);202 },203 onInputAudioBufferSpeechStopped: async (event, context) => {204 console.log("Speech stopped at:", event.audioEndMs);205 },206207 // Transcription events208 onConversationItemInputAudioTranscriptionCompleted: async (event, context) => {209 console.log("User said:", event.transcript);210 },211 onConversationItemInputAudioTranscriptionDelta: async (event, context) => {212 process.stdout.write(event.delta);213 },214215 // Response events216 onResponseCreated: async (event, context) => {217 console.log("Response started");218 },219 onResponseDone: async (event, context) => {220 console.log("Response complete");221 },222223 // Streaming text224 onResponseTextDelta: async (event, context) => {225 process.stdout.write(event.delta);226 },227 onResponseTextDone: async (event, context) => {228 console.log("\n--- Text complete ---");229 },230231 // Streaming audio232 onResponseAudioDelta: async (event, context) => {233 const audioData = event.delta;234 playAudioChunk(audioData);235 },236 onResponseAudioDone: async (event, context) => {237 console.log("Audio complete");238 },239240 // Audio transcript (what assistant said)241 onResponseAudioTranscriptDelta: async (event, context) => {242 process.stdout.write(event.delta);243 },244245 // Function calling246 onResponseFunctionCallArgumentsDone: async (event, context) => {247 if (event.name === "get_weather") {248 const args = JSON.parse(event.arguments);249 const result = await getWeather(args.location);250251 await session.addConversationItem({252 type: "function_call_output",253 callId: event.callId,254 output: JSON.stringify(result),255 });256257 await session.sendEvent({ type: "response.create" });258 }259 },260261 // Catch-all for debugging262 onServerEvent: async (event, context) => {263 console.log("Event:", event.type);264 },265});266267// Clean up when done268await subscription.close();269```270271## Function Calling272273```typescript274// Define tools in session config275await session.updateSession({276 modalities: ["audio", "text"],277 instructions: "Help users with weather information.",278 tools: [279 {280 type: "function",281 name: "get_weather",282 description: "Get current weather for a location",283 parameters: {284 type: "object",285 properties: {286 location: {287 type: "string",288 description: "City and state or country",289 },290 },291 required: ["location"],292 },293 },294 ],295 toolChoice: "auto",296});297298// Handle function calls299const subscription = session.subscribe({300 onResponseFunctionCallArgumentsDone: async (event, context) => {301 if (event.name === "get_weather") {302 const args = JSON.parse(event.arguments);303 const weatherData = await fetchWeather(args.location);304305 // Send function result306 await session.addConversationItem({307 type: "function_call_output",308 callId: event.callId,309 output: JSON.stringify(weatherData),310 });311312 // Trigger response generation313 await session.sendEvent({ type: "response.create" });314 }315 },316});317```318319## Voice Options320321| Voice Type | Config | Example |322|------------|--------|---------|323| Azure Standard | `{ type: "azure-standard", name: "..." }` | `"en-US-AvaNeural"` |324| Azure Custom | `{ type: "azure-custom", name: "...", endpointId: "..." }` | Custom voice endpoint |325| Azure Personal | `{ type: "azure-personal", speakerProfileId: "..." }` | Personal voice clone |326| OpenAI | `{ type: "openai", name: "..." }` | `"alloy"`, `"echo"`, `"shimmer"` |327328## Supported Models329330| Model | Description | Use Case |331|-------|-------------|----------|332| `gpt-4o-realtime-preview` | GPT-4o with real-time audio | High-quality conversational AI |333| `gpt-4o-mini-realtime-preview` | Lightweight GPT-4o | Fast, efficient interactions |334| `phi4-mm-realtime` | Phi multimodal | Cost-effective applications |335336## Turn Detection Options337338```typescript339// Server VAD (default)340turnDetection: {341 type: "server_vad",342 threshold: 0.5,343 prefixPaddingMs: 300,344 silenceDurationMs: 500,345}346347// Azure Semantic VAD (smarter detection)348turnDetection: {349 type: "azure_semantic_vad",350}351352// Azure Semantic VAD (English optimized)353turnDetection: {354 type: "azure_semantic_vad_en",355}356357// Azure Semantic VAD (Multilingual)358turnDetection: {359 type: "azure_semantic_vad_multilingual",360}361```362363## Audio Formats364365| Format | Sample Rate | Use Case |366|--------|-------------|----------|367| `pcm16` | 24kHz | Default, high quality |368| `pcm16-8000hz` | 8kHz | Telephony |369| `pcm16-16000hz` | 16kHz | Voice assistants |370| `g711_ulaw` | 8kHz | Telephony (US) |371| `g711_alaw` | 8kHz | Telephony (EU) |372373## Key Types Reference374375| Type | Purpose |376|------|---------|377| `VoiceLiveClient` | Main client for creating sessions |378| `VoiceLiveSession` | Active WebSocket session |379| `VoiceLiveSessionHandlers` | Event handler interface |380| `VoiceLiveSubscription` | Active event subscription |381| `ConnectionContext` | Context for connection events |382| `SessionContext` | Context for session events |383| `ServerEventUnion` | Union of all server events |384385## Error Handling386387```typescript388import {389 VoiceLiveError,390 VoiceLiveConnectionError,391 VoiceLiveAuthenticationError,392 VoiceLiveProtocolError,393} from "@azure/ai-voicelive";394395const subscription = session.subscribe({396 onError: async (args, context) => {397 const { error } = args;398399 if (error instanceof VoiceLiveConnectionError) {400 console.error("Connection error:", error.message);401 } else if (error instanceof VoiceLiveAuthenticationError) {402 console.error("Auth error:", error.message);403 } else if (error instanceof VoiceLiveProtocolError) {404 console.error("Protocol error:", error.message);405 }406 },407408 onServerError: async (event, context) => {409 console.error("Server error:", event.error?.message);410 },411});412```413414## Logging415416```typescript417import { setLogLevel } from "@azure/logger";418419// Enable verbose logging420setLogLevel("info");421422// Or via environment variable423// AZURE_LOG_LEVEL=info424```425426## Browser Usage427428```typescript429// Browser requires bundler (Vite, webpack, etc.)430import { VoiceLiveClient } from "@azure/ai-voicelive";431import { InteractiveBrowserCredential } from "@azure/identity";432433// Use browser-compatible credential434const credential = new InteractiveBrowserCredential({435 clientId: "your-client-id",436 tenantId: "your-tenant-id",437});438439const client = new VoiceLiveClient(endpoint, credential);440441// Request microphone access442const stream = await navigator.mediaDevices.getUserMedia({ audio: true });443const audioContext = new AudioContext({ sampleRate: 24000 });444445// Process audio and send to session446// ... (see samples for full implementation)447```448449## Best Practices4504511. **Always use `DefaultAzureCredential`** — Never hardcode API keys4522. **Set both modalities** — Include `["text", "audio"]` for voice assistants4533. **Use Azure Semantic VAD** — Better turn detection than basic server VAD4544. **Handle all error types** — Connection, auth, and protocol errors4555. **Clean up subscriptions** — Call `subscription.close()` when done4566. **Use appropriate audio format** — PCM16 at 24kHz for best quality457458## Reference Links459460| Resource | URL |461|----------|-----|462| npm Package | https://www.npmjs.com/package/@azure/ai-voicelive |463| GitHub Source | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive |464| Samples | https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-voicelive/samples |465| API Reference | https://learn.microsoft.com/javascript/api/@azure/ai-voicelive |466
Full transparency — inspect the skill content before installing.