Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support
Add this skill
npx mdskills install sickn33/voice-ai-engine-developmentExceptionally detailed async worker pipeline architecture with streaming, interrupts, and multi-provider support
Build production-ready real-time conversational AI voice engines with async worker pipelines, streaming transcription, LLM agents, and TTS synthesis.
This skill provides comprehensive guidance for building voice AI engines that enable natural, bidirectional conversations between users and AI agents. It covers the complete architecture from audio input to audio output, including:
# Use the skill in your AI assistant
@voice-ai-engine-development I need to build a voice assistant that can handle real-time conversations with interrupts
SKILL.md - Comprehensive guide to voice AI engine developmentcomplete_voice_engine.py - Full working implementationgemini_agent_example.py - LLM agent with proper response bufferinginterrupt_system_example.py - Interrupt handling demonstrationbase_worker_template.py - Template for creating new workersmulti_provider_factory_template.py - Multi-provider factory patterncommon_pitfalls.md - Common issues and solutionsprovider_comparison.md - Comparison of transcription, LLM, and TTS providersEvery voice AI engine follows this pipeline:
Audio In → Transcriber → Agent → Synthesizer → Audio Out
(Worker 1) (Worker 2) (Worker 3)
Each worker:
class BaseWorker:
async def _run_loop(self):
while self.active:
item = await self.input_queue.get()
await self.process(item)
# User interrupts bot mid-sentence
if stop_event.is_set():
partial_message = get_message_up_to(seconds_spoken)
return partial_message, True # cut_off = True
factory = VoiceComponentFactory()
transcriber = factory.create_transcriber(config) # Deepgram, AssemblyAI, etc.
agent = factory.create_agent(config) # OpenAI, Gemini, etc.
synthesizer = factory.create_synthesizer(config) # ElevenLabs, Azure, etc.
The skill includes examples for:
See references/common_pitfalls.md for detailed solutions to:
This skill is part of the Antigravity Awesome Skills repository. Contributions are welcome!
@websocket-patterns - WebSocket implementation@async-python - Asyncio patterns@streaming-apis - Streaming API integration@audio-processing - Audio format conversionMIT License - See repository LICENSE file
Built with ❤️ for the Antigravity community
Install via CLI
npx mdskills install sickn33/voice-ai-engine-developmentVoice AI Engine Development is a free, open-source AI agent skill. Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support
Install Voice AI Engine Development with a single command:
npx mdskills install sickn33/voice-ai-engine-developmentThis downloads the skill files into your project and your AI agent picks them up automatically.
Voice AI Engine Development works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code, Factory. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.