Add this skill
npx mdskills install sickn33/azure-ai-voicelive-javaComprehensive SDK documentation with clear workflow, auth patterns, and audio configuration examples
1---2name: azure-ai-voicelive-java3description: |4 Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket.5 Triggers: "VoiceLiveClient java", "voice assistant java", "real-time voice java", "audio streaming java", "voice activity detection java".6package: com.azure:azure-ai-voicelive7---89# Azure AI VoiceLive SDK for Java1011Real-time, bidirectional voice conversations with AI assistants using WebSocket technology.1213## Installation1415```xml16<dependency>17 <groupId>com.azure</groupId>18 <artifactId>azure-ai-voicelive</artifactId>19 <version>1.0.0-beta.2</version>20</dependency>21```2223## Environment Variables2425```bash26AZURE_VOICELIVE_ENDPOINT=https://<resource>.openai.azure.com/27AZURE_VOICELIVE_API_KEY=<your-api-key>28```2930## Authentication3132### API Key3334```java35import com.azure.ai.voicelive.VoiceLiveAsyncClient;36import com.azure.ai.voicelive.VoiceLiveClientBuilder;37import com.azure.core.credential.AzureKeyCredential;3839VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()40 .endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT"))41 .credential(new AzureKeyCredential(System.getenv("AZURE_VOICELIVE_API_KEY")))42 .buildAsyncClient();43```4445### DefaultAzureCredential (Recommended)4647```java48import com.azure.identity.DefaultAzureCredentialBuilder;4950VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()51 .endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT"))52 .credential(new DefaultAzureCredentialBuilder().build())53 .buildAsyncClient();54```5556## Key Concepts5758| Concept | Description |59|---------|-------------|60| `VoiceLiveAsyncClient` | Main entry point for voice sessions |61| `VoiceLiveSessionAsyncClient` | Active WebSocket connection for streaming |62| `VoiceLiveSessionOptions` | Configuration for session behavior |6364### Audio Requirements6566- **Sample Rate**: 24kHz (24000 Hz)67- **Bit Depth**: 16-bit PCM68- **Channels**: Mono (1 channel)69- **Format**: Signed PCM, little-endian7071## Core Workflow7273### 1. Start Session7475```java76import reactor.core.publisher.Mono;7778client.startSession("gpt-4o-realtime-preview")79 .flatMap(session -> {80 System.out.println("Session started");8182 // Subscribe to events83 session.receiveEvents()84 .subscribe(85 event -> System.out.println("Event: " + event.getType()),86 error -> System.err.println("Error: " + error.getMessage())87 );8889 return Mono.just(session);90 })91 .block();92```9394### 2. Configure Session Options9596```java97import com.azure.ai.voicelive.models.*;98import java.util.Arrays;99100ServerVadTurnDetection turnDetection = new ServerVadTurnDetection()101 .setThreshold(0.5) // Sensitivity (0.0-1.0)102 .setPrefixPaddingMs(300) // Audio before speech103 .setSilenceDurationMs(500) // Silence to end turn104 .setInterruptResponse(true) // Allow interruptions105 .setAutoTruncate(true)106 .setCreateResponse(true);107108AudioInputTranscriptionOptions transcription = new AudioInputTranscriptionOptions(109 AudioInputTranscriptionOptionsModel.WHISPER_1);110111VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()112 .setInstructions("You are a helpful AI voice assistant.")113 .setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)))114 .setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))115 .setInputAudioFormat(InputAudioFormat.PCM16)116 .setOutputAudioFormat(OutputAudioFormat.PCM16)117 .setInputAudioSamplingRate(24000)118 .setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD))119 .setInputAudioEchoCancellation(new AudioEchoCancellation())120 .setInputAudioTranscription(transcription)121 .setTurnDetection(turnDetection);122123// Send configuration124ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(options);125session.sendEvent(updateEvent).subscribe();126```127128### 3. Send Audio Input129130```java131byte[] audioData = readAudioChunk(); // Your PCM16 audio data132session.sendInputAudio(BinaryData.fromBytes(audioData)).subscribe();133```134135### 4. Handle Events136137```java138session.receiveEvents().subscribe(event -> {139 ServerEventType eventType = event.getType();140141 if (ServerEventType.SESSION_CREATED.equals(eventType)) {142 System.out.println("Session created");143 } else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED.equals(eventType)) {144 System.out.println("User started speaking");145 } else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED.equals(eventType)) {146 System.out.println("User stopped speaking");147 } else if (ServerEventType.RESPONSE_AUDIO_DELTA.equals(eventType)) {148 if (event instanceof SessionUpdateResponseAudioDelta) {149 SessionUpdateResponseAudioDelta audioEvent = (SessionUpdateResponseAudioDelta) event;150 playAudioChunk(audioEvent.getDelta());151 }152 } else if (ServerEventType.RESPONSE_DONE.equals(eventType)) {153 System.out.println("Response complete");154 } else if (ServerEventType.ERROR.equals(eventType)) {155 if (event instanceof SessionUpdateError) {156 SessionUpdateError errorEvent = (SessionUpdateError) event;157 System.err.println("Error: " + errorEvent.getError().getMessage());158 }159 }160});161```162163## Voice Configuration164165### OpenAI Voices166167```java168// Available: ALLOY, ASH, BALLAD, CORAL, ECHO, SAGE, SHIMMER, VERSE169VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()170 .setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY)));171```172173### Azure Voices174175```java176// Azure Standard Voice177options.setVoice(BinaryData.fromObject(new AzureStandardVoice("en-US-JennyNeural")));178179// Azure Custom Voice180options.setVoice(BinaryData.fromObject(new AzureCustomVoice("myVoice", "endpointId")));181182// Azure Personal Voice183options.setVoice(BinaryData.fromObject(184 new AzurePersonalVoice("speakerProfileId", PersonalVoiceModels.PHOENIX_LATEST_NEURAL)));185```186187## Function Calling188189```java190VoiceLiveFunctionDefinition weatherFunction = new VoiceLiveFunctionDefinition("get_weather")191 .setDescription("Get current weather for a location")192 .setParameters(BinaryData.fromObject(parametersSchema));193194VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()195 .setTools(Arrays.asList(weatherFunction))196 .setInstructions("You have access to weather information.");197```198199## Best Practices2002011. **Use async client** — VoiceLive requires reactive patterns2022. **Configure turn detection** for natural conversation flow2033. **Enable noise reduction** for better speech recognition2044. **Handle interruptions** gracefully with `setInterruptResponse(true)`2055. **Use Whisper transcription** for input audio transcription2066. **Close sessions** properly when conversation ends207208## Error Handling209210```java211session.receiveEvents()212 .doOnError(error -> System.err.println("Connection error: " + error.getMessage()))213 .onErrorResume(error -> {214 // Attempt reconnection or cleanup215 return Flux.empty();216 })217 .subscribe();218```219220## Reference Links221222| Resource | URL |223|----------|-----|224| GitHub Source | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive |225| Samples | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive/src/samples |226
Full transparency — inspect the skill content before installing.