|
Add this skill
npx mdskills install sickn33/azure-ai-voicelive-dotnetComprehensive SDK documentation with clear examples, but not an agent skill with trigger conditions
1---2name: azure-ai-voicelive-dotnet3description: |4 Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant .NET", "bidirectional audio", "speech-to-speech".5package: Azure.AI.VoiceLive6---78# Azure.AI.VoiceLive (.NET)910Real-time voice AI SDK for building bidirectional voice assistants with Azure AI.1112## Installation1314```bash15dotnet add package Azure.AI.VoiceLive16dotnet add package Azure.Identity17dotnet add package NAudio # For audio capture/playback18```1920**Current Versions**: Stable v1.0.0, Preview v1.1.0-beta.12122## Environment Variables2324```bash25AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/26AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview27AZURE_VOICELIVE_VOICE=en-US-AvaNeural28# Optional: API key if not using Entra ID29AZURE_VOICELIVE_API_KEY=<your-api-key>30```3132## Authentication3334### Microsoft Entra ID (Recommended)3536```csharp37using Azure.Identity;38using Azure.AI.VoiceLive;3940Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");41DefaultAzureCredential credential = new DefaultAzureCredential();42VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);43```4445**Required Role**: `Cognitive Services User` (assign in Azure Portal → Access control)4647### API Key4849```csharp50Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");51AzureKeyCredential credential = new AzureKeyCredential("your-api-key");52VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);53```5455## Client Hierarchy5657```58VoiceLiveClient59└── VoiceLiveSession (WebSocket connection)60 ├── ConfigureSessionAsync()61 ├── GetUpdatesAsync() → SessionUpdate events62 ├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem63 ├── SendAudioAsync()64 └── StartResponseAsync()65```6667## Core Workflow6869### 1. Start Session and Configure7071```csharp72using Azure.Identity;73using Azure.AI.VoiceLive;7475var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));76var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());7778var model = "gpt-4o-mini-realtime-preview";7980// Start session81using VoiceLiveSession session = await client.StartSessionAsync(model);8283// Configure session84VoiceLiveSessionOptions sessionOptions = new()85{86 Model = model,87 Instructions = "You are a helpful AI assistant. Respond naturally.",88 Voice = new AzureStandardVoice("en-US-AvaNeural"),89 TurnDetection = new AzureSemanticVadTurnDetection()90 {91 Threshold = 0.5f,92 PrefixPadding = TimeSpan.FromMilliseconds(300),93 SilenceDuration = TimeSpan.FromMilliseconds(500)94 },95 InputAudioFormat = InputAudioFormat.Pcm16,96 OutputAudioFormat = OutputAudioFormat.Pcm1697};9899// Set modalities (both text and audio for voice assistants)100sessionOptions.Modalities.Clear();101sessionOptions.Modalities.Add(InteractionModality.Text);102sessionOptions.Modalities.Add(InteractionModality.Audio);103104await session.ConfigureSessionAsync(sessionOptions);105```106107### 2. Process Events108109```csharp110await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())111{112 switch (serverEvent)113 {114 case SessionUpdateResponseAudioDelta audioDelta:115 byte[] audioData = audioDelta.Delta.ToArray();116 // Play audio via NAudio or other audio library117 break;118119 case SessionUpdateResponseTextDelta textDelta:120 Console.Write(textDelta.Delta);121 break;122123 case SessionUpdateResponseFunctionCallArgumentsDone functionCall:124 // Handle function call (see Function Calling section)125 break;126127 case SessionUpdateError error:128 Console.WriteLine($"Error: {error.Error.Message}");129 break;130131 case SessionUpdateResponseDone:132 Console.WriteLine("\n--- Response complete ---");133 break;134 }135}136```137138### 3. Send User Message139140```csharp141await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));142await session.StartResponseAsync();143```144145### 4. Function Calling146147```csharp148// Define function149var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")150{151 Description = "Get the current weather for a given location",152 Parameters = BinaryData.FromString("""153 {154 "type": "object",155 "properties": {156 "location": {157 "type": "string",158 "description": "The city and state or country"159 }160 },161 "required": ["location"]162 }163 """)164};165166// Add to session options167sessionOptions.Tools.Add(weatherFunction);168169// Handle function call in event loop170if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)171{172 if (functionCall.Name == "get_current_weather")173 {174 var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);175 string location = parameters?["location"] ?? "";176177 // Call external service178 string weatherInfo = $"The weather in {location} is sunny, 75°F.";179180 // Send response181 await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));182 await session.StartResponseAsync();183 }184}185```186187## Voice Options188189| Voice Type | Class | Example |190|------------|-------|---------|191| Azure Standard | `AzureStandardVoice` | `"en-US-AvaNeural"` |192| Azure HD | `AzureStandardVoice` | `"en-US-Ava:DragonHDLatestNeural"` |193| Azure Custom | `AzureCustomVoice` | Custom voice with endpoint ID |194195## Supported Models196197| Model | Description |198|-------|-------------|199| `gpt-4o-realtime-preview` | GPT-4o with real-time audio |200| `gpt-4o-mini-realtime-preview` | Lightweight, fast interactions |201| `phi4-mm-realtime` | Cost-effective multimodal |202203## Key Types Reference204205| Type | Purpose |206|------|---------|207| `VoiceLiveClient` | Main client for creating sessions |208| `VoiceLiveSession` | Active WebSocket session |209| `VoiceLiveSessionOptions` | Session configuration |210| `AzureStandardVoice` | Standard Azure voice provider |211| `AzureSemanticVadTurnDetection` | Voice activity detection |212| `VoiceLiveFunctionDefinition` | Function tool definition |213| `UserMessageItem` | User text message |214| `FunctionCallOutputItem` | Function call response |215| `SessionUpdateResponseAudioDelta` | Audio chunk event |216| `SessionUpdateResponseTextDelta` | Text chunk event |217218## Best Practices2192201. **Always set both modalities** — Include `Text` and `Audio` for voice assistants2212. **Use `AzureSemanticVadTurnDetection`** — Provides natural conversation flow2223. **Configure appropriate silence duration** — 500ms typical to avoid premature cutoffs2234. **Use `using` statement** — Ensures proper session disposal2245. **Handle all event types** — Check for errors, audio, text, and function calls2256. **Use DefaultAzureCredential** — Never hardcode API keys226227## Error Handling228229```csharp230if (serverEvent is SessionUpdateError error)231{232 if (error.Error.Message.Contains("Cancellation failed: no active response"))233 {234 // Benign error, can ignore235 }236 else237 {238 Console.WriteLine($"Error: {error.Error.Message}");239 }240}241```242243## Audio Configuration244245- **Input Format**: `InputAudioFormat.Pcm16` (16-bit PCM)246- **Output Format**: `OutputAudioFormat.Pcm16`247- **Sample Rate**: 24kHz recommended248- **Channels**: Mono249250## Related SDKs251252| SDK | Purpose | Install |253|-----|---------|---------|254| `Azure.AI.VoiceLive` | Real-time voice (this SDK) | `dotnet add package Azure.AI.VoiceLive` |255| `Microsoft.CognitiveServices.Speech` | Speech-to-text, text-to-speech | `dotnet add package Microsoft.CognitiveServices.Speech` |256| `NAudio` | Audio capture/playback | `dotnet add package NAudio` |257258## Reference Links259260| Resource | URL |261|----------|-----|262| NuGet Package | https://www.nuget.org/packages/Azure.AI.VoiceLive |263| API Reference | https://learn.microsoft.com/dotnet/api/azure.ai.voicelive |264| GitHub Source | https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.VoiceLive |265| Quickstart | https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-quickstart |266
Full transparency — inspect the skill content before installing.