An MCP Server, CLI tool, and API that makes phone calls on your behalf using VoIP. Just tell Claude what you want to accomplish, and it will call and handle the conversation for you. This is essentially an MCP Server that bridges between OpenAI's Real-Time Voice API and your VoIP connection to call people on your behalf. You: "Can you call Tony's Pizza and order a large pepperoni pizza for deliver
Add this skill
npx mdskills install gerkensm/callcenter-js-mcpWell-documented VoIP calling server with OpenAI integration, clear examples, but over-scoped permissions
1[](https://deepwiki.com/gerkensm/callcenter.js-mcp)23# CallCenter.js MCP + CLI45**An MCP Server, CLI tool, and API that makes phone calls on your behalf using VoIP.**67Just tell Claude what you want to accomplish, and it will call and handle the conversation for you. This is essentially an MCP Server that bridges between OpenAI's Real-Time Voice API and your VoIP connection to call people on your behalf.89> **⚠️ Vibe-coded side project!** Please do not use this in any kind of professional context. This is a side project coded in a weekend. There are no guard rails. Your MCP client can call *any* number with this, even if you don't ask it to. In fact, it has done so during testing - it called a random number during the night "for testing" and played back scary low-pitched noises - then claimed it called MY number. So YMMV, no warranties. See [disclaimer](#️-important-disclaimer) below.1011## 📞 Example: Order Pizza with Claude1213**You:** "Can you call Tony's Pizza and order a large pepperoni pizza for delivery to 123 Main St? My name is John and my number is 555-0123."1415**Claude automatically calls the restaurant:**1617```18⏺ mcp__callcenter_js__simple_call(phone_number: "+15551234567",19 brief: "Call Tony's Pizza and order a large pepperoni pizza for delivery to 123 Main St. Customer name is John, phone number 555-0123",20 caller_name: "John")21 ⎿ # Simple Call Result2223 **Status:** ✅ Success24 **Duration:** 3 minutes 24 seconds25 **Call ID:** abc123xyz2627 ## Call Transcript2829 [14:23:15] 🎤 HUMAN: Tony's Pizza, how can I help you?30 [14:23:15] 🤖 ASSISTANT: Hi! I'm calling on behalf of John to place a delivery order.31 [14:23:20] 🎤 HUMAN: Sure! What would you like?32 [14:23:20] 🤖 ASSISTANT: I'd like to order one large pepperoni pizza for delivery please.33 [14:23:25] 🎤 HUMAN: Large pepperoni, got it. What's the delivery address?34 [14:23:25] 🤖 ASSISTANT: The address is 123 Main Street.35 [14:23:30] 🎤 HUMAN: And your phone number?36 [14:23:30] 🤖 ASSISTANT: The phone number is 555-0123.37 [14:23:35] 🎤 HUMAN: Perfect! That'll be $18.99. We'll have it delivered in about 30 minutes.38 [14:23:40] 🤖 ASSISTANT: That sounds great! Thank you so much.39 [14:23:42] 🎤 HUMAN: You're welcome! Have a great day.40```4142**Pizza ordered successfully!** 🍕4344## 📚 Quick Context for the Uninitiated4546**VoIP (Voice over IP)** is how you make phone calls over the internet instead of traditional phone lines. **SIP (Session Initiation Protocol)** is the language these systems speak to connect calls. Think of it as HTTP but for phone calls.4748**Fritz!Box** is a popular German router/modem that happens to have a built-in phone system (PBX). If you have one, you already have everything you need to make VoIP calls - this tool just connects to it. Outside Germany, you might know similar devices from other brands, or use dedicated VoIP services like Asterisk, 3CX, or cloud providers.4950**MCP (Model Context Protocol)** is Anthropic's standard for connecting AI assistants like Claude to external tools and services. It's what lets MCP clients actually *do* things instead of just talking about them.5152## 🚀 What This Enables5354- 🔌 **MCP Server** - Use directly in Claude Code or any MCP client (most popular usage)55- 🖥️ **CLI Tool** - Command-line interface for direct phone calls56- 📚 **TypeScript API** - Programmatic library for building voice applications5758Built as a bridge between OpenAI's Real-Time Voice API and VoIP networks, with multiple codec support (G.722, G.711), and expanded SIP protocol support for broad VoIP compatibility. Compatible with the latest `gpt-realtime` model released August 28, 2025.5960## 🏗️ System Architecture6162```mermaid63graph TB64 subgraph "User Interface"65 A[Claude Code/MCP Client]66 B[CLI Tool]67 C[TypeScript API]68 end6970 subgraph "CallCenter.js Core"71 D[MCP Server]72 E[VoiceAgent]73 F[Call Brief Processor<br/>o3-mini model]74 end7576 subgraph "Communication Layer"77 G[SIP Client<br/>Provider Support]78 H[Audio Bridge<br/>RTP Streaming]79 end8081 subgraph "Audio Processing"82 I[G.722 Codec<br/>16kHz Wideband]83 J[G.711 Codec<br/>8kHz Fallback]84 end8586 subgraph "External Services"87 K[OpenAI Real-Time API<br/>gpt-realtime model]88 L[VoIP Network<br/>Fritz!Box/Asterisk/etc]89 end9091 A --> D92 B --> E93 C --> E94 D --> E95 E --> F96 E --> G97 E --> H98 F --> K99 G --> L100 H --> I101 H --> J102 H --> K103104 style F fill:#e1f5fe105 style K fill:#fff3e0106 style L fill:#f3e5f5107```108109> **⚠️ Vibe-coded project!** Developed and tested on Fritz!Box (a German router with built-in VoIP) only. Other provider configs are research-based but untested. YMMV, no warranties. See [disclaimer](#️-important-disclaimer) below.110111112113## 🔌 MCP Client Integration (Most Popular!)114115**Perfect for when your coding agent needs to call library authors to complain about their documentation!** 😄116117### Quick Setup118119```bash120# Add to Claude Code with one command:121claude mcp add --env SIP_USERNAME=your_actual_extension \122 --env SIP_PASSWORD="your_actual_password" \123 --env SIP_SERVER_IP=192.168.1.1 \124 --env OPENAI_API_KEY="sk-your_actual_openai_key" \125 --env USER_NAME="Your Actual Name" \126 -- callcenter.js npx -- github:gerkensm/callcenter.js-mcp --mcp127```128129**Then just ask your MCP Client to make calls:**130131> "Can you call the pizza place and order a large pepperoni? My number is 555-0123."132133Your MCP Client will automatically handle the entire conversation using the AI Voice Agent! 🤖📞134135## ✨ Key Features136137- 🎙️ **Multiple Codec Support**: G.722 wideband (16kHz) + G.711 fallback for broad compatibility138- 🧩 **Compiler-Free WASM Codec**: Ships with a prebuilt G.722 WebAssembly module so `npx` users get wideband audio without installing build tools (native addon still available for max performance)139- 🤖 **AI-Powered Conversations**: Uses OpenAI's Real-Time Voice API with the latest `gpt-realtime` model (released August 28, 2025) for actual calls, with o3-mini model for instruction generation140- 🌍 **Automatic Language Detection**: Intelligently detects conversation language from call briefs and configures transcription accordingly141- 🎭 **Auto Voice Selection**: New 'auto' mode where o3-mini selects optimal voice based on call context (formality, industry, goals)142- 🔊 **Voice Characteristics**: Full support for all 10 OpenAI Realtime API voices with gender and personality awareness143- 🌐 **Expanded SIP Support**: Configurations for common SIP providers (Fritz!Box tested, others experimental)144- 🔧 **Smart Configuration**: Auto-detects provider requirements and optimizes settings145- 📞 **Enterprise-Ready**: Supports advanced SIP features (STUN/TURN, session timers, transport fallback)146- 🔄 **Robust Connection Management**: Automatic reconnection with intelligent error handling147- ✅ **Built-in Validation**: Comprehensive configuration validation with network testing148- 🎯 **Provider Profiles**: Pre-configured settings for popular SIP systems149- 🔌 **MCP Server**: Integrate with MCP clients like Claude Code150- 📚 **TypeScript API**: Programmatic library for building voice applications151- 📝 **Call Brief Processing**: Natural language call instructions using o3-mini model with structured JSON output152- 🎵 **Optional Call Recording**: Stereo WAV recording with caller/AI separation153154## 🚀 Quick Start155156### Option 1: Run Instantly with npx (No Installation) ⚡157158**Fastest way to try it out:**159160```bash161# Set environment variables (or create .env file)162export SIP_USERNAME="your_extension"163export SIP_PASSWORD="your_password"164export SIP_SERVER_IP="192.168.1.1"165export OPENAI_API_KEY="sk-your-key-here"166167# Run directly from GitHub (no installation needed!)168npx github:gerkensm/callcenter.js-mcp call "+1234567890" --brief "Call restaurant for reservation" --user-name "Your Name"169```170171**Or using a .env file:**172173```bash174# Create .env file175cat > .env << EOF176SIP_USERNAME=your_extension177SIP_PASSWORD=your_password178SIP_SERVER_IP=192.168.1.1179OPENAI_API_KEY=sk-your-key-here180SIP_PROVIDER=fritz-box181OPENAI_VOICE=auto182EOF183184# Run from GitHub (loads .env automatically)185npx github:gerkensm/callcenter.js-mcp call "+1234567890" --brief "Call restaurant"186```187188**Note**: High-quality G.722 audio ships as a prebuilt WebAssembly module, so `npx` works even on machines without compilers. If you want the faster native addon instead, run `npm run build:native` (or `npm run build:all`) after cloning.189190### Option 2: Local Installation191192#### Prerequisites193194- Node.js 20+195- (Optional) **Python 3.x + C/C++ build tools** — only needed if you plan to rebuild the native addon instead of using the bundled WebAssembly codec196 - macOS: Xcode Command Line Tools (`xcode-select --install`)197 - Windows: Visual Studio Build Tools198 - Linux: `build-essential` package199- OpenAI API key200201**Note**: The prebuilt WebAssembly codec already provides wideband G.722 audio out of the box. Rebuilding the native addon is optional and mainly useful for squeezing out a little more performance.202203#### Installation204205```bash206# Clone and install207git clone https://github.com/gerkensm/callcenter.js-mcp208cd callcenter.js-mcp209npm install210211# Copy example configuration212cp config.example.json config.json213```214215### Configuration216217Edit `config.json` with your settings:218219```json220{221 "sip": {222 "username": "your_sip_username",223 "password": "your_sip_password",224 "serverIp": "192.168.1.1",225 "serverPort": 5060,226 "provider": "fritz-box"227 },228 "ai": {229 "openaiApiKey": "sk-your-openai-api-key-here",230 "voice": "alloy",231 "instructions": "You are a helpful AI assistant making phone calls on behalf of users.",232 "userName": "Your Name"233 }234}235```236237## 🎯 Usage Options238239### 1. MCP Server (Claude Code Integration) ⭐240241**Most popular usage** - integrates with Claude Code for seamless AI-powered calling. Perfect for when your coding agent needs to call library authors to complain about their documentation! 😄242243#### **Quick Setup with npx (Recommended)**244245**Option 1: Using MCP Client CLI (Easiest)**246247```bash248# Replace with your ACTUAL credentials before running:249claude mcp add --env SIP_USERNAME=your_actual_extension \250 --env SIP_PASSWORD="your_actual_password" \251 --env SIP_SERVER_IP=192.168.1.1 \252 --env OPENAI_API_KEY="sk-your_actual_openai_key" \253 --env USER_NAME="Your Actual Name" \254 -- callcenter.js npx -- github:gerkensm/callcenter.js-mcp --mcp255```256257**⚠️ Important:** Replace the placeholder values with your actual SIP credentials and OpenAI API key, or the server will fail to connect.258259**Option 2: Manual Configuration**260261Configure in Claude Code's MCP settings to automatically pull from GitHub:262263```json264{265 "mcpServers": {266 "callcenter.js": {267 "command": "npx",268 "args": ["github:gerkensm/callcenter.js-mcp", "--mcp"],269 "env": {270 "SIP_USERNAME": "your_extension",271 "SIP_PASSWORD": "your_password",272 "SIP_SERVER_IP": "192.168.1.1",273 "OPENAI_API_KEY": "sk-your-key-here",274 "USER_NAME": "Your Name"275 }276 }277 }278}279```280281#### **Alternative: Local Installation**282283For local development or if you prefer local installation:284285```bash286npm start --mcp287```288289Or configure Claude Code with local installation:290291```json292{293 "mcpServers": {294 "callcenter.js": {295 "command": "node",296 "args": ["dist/cli.js", "--mcp"],297 "cwd": "/path/to/voip-agent"298 }299 }300}301```302303Available MCP tools:304- `simple_call` - Make calls with automatic instruction generation305- `advanced_call` - Make calls with granular parameter control306307**Example usage in MCP Client:**308309```310You: "Can you call Bocca di Bacco restaurant and book a table for 2 people tonight at 7:30pm? My name is John Doe."311312MCP Client: I'll call Bocca di Bacco restaurant to book a table for 2 people tonight at 7:30pm.313314🔧 mcp__ai-voice-agent__simple_call(315 phone_number: "+1234567890",316 brief: "Call Bocca di Bacco restaurant and book a table for 2 people tonight at 7:30pm",317 caller_name: "John Doe"318)319320✅ Call completed successfully!321📞 Duration: 2 minutes 15 seconds322📝 Reservation confirmed for 2 people at 7:30pm tonight323```324325**More examples:**326327```328You: "My internet is down. Can you call my ISP and get a status update? I'm Sarah Johnson, account #12345."329330MCP Client: I'll call your internet service provider to check on the outage status.331332🔧 mcp__ai-voice-agent__simple_call(333 phone_number: "+18005551234",334 brief: "Call ISP about internet outage, customer Sarah Johnson account #12345",335 caller_name: "Sarah Johnson"336)337```338339```340You: "Call Dr. Smith's office to reschedule my 3pm appointment to next week. I'm Mike Chen."341342MCP Client: I'll call Dr. Smith's office to reschedule your appointment.343344🔧 mcp__ai-voice-agent__simple_call(345 phone_number: "+15551234567",346 brief: "Call Dr. Smith's office to reschedule Mike Chen's 3pm appointment to next week",347 caller_name: "Mike Chen"348)349```350351The MCP Client automatically handles the entire conversation using the AI Voice Agent!352353### 2. Command Line Interface354355**Perfect for when you need to `curl -X POST` your way out of social obligations, or finally implement that O(n log n) `ai-human-sort` algorithm - because nothing says "efficient sorting" like crowdsourcing comparisons to random strangers via VoIP!** 😄356357**💡 Use `--brief` instead of `--instructions` for better results!**358359The `--brief` option uses OpenAI's o3-mini model to generate sophisticated instructions from your simple description, while `--instructions` sends your text directly to the Real-Time Voice API. Since the Real-Time Voice API is optimized for speed (not sophistication), `--brief` typically produces much better call outcomes.360361```bash362# ✅ RECOMMENDED: Use brief for natural language goals363npm start call "+1234567890" --brief "Call the restaurant and book a table for 2 at 7pm tonight" --user-name "John Doe"364365# ✅ RECOMMENDED: Brief works for any call type366npm start call "+1234567890" --brief "Call to check appointment availability for John Doe"367368# ⚠️ ONLY use instructions for very specific, custom behavior369npm start call "+1234567890" --instructions "You must follow this exact script: Say hello, ask for manager, then hang up"370371# Other examples with brief372npm start call "+1234567890" --record "meeting.wav" --duration 300 --brief "Conference call to discuss project status"373npm start call "+1234567890" --log-level verbose --brief "Test call to verify connectivity"374```375376#### CLI Options377378```bash379npm start call <number> [options]380381Options:382 -c, --config <path> Configuration file path (default: config.json)383 -d, --duration <seconds> Maximum call duration in seconds (default: 600)384 -v, --verbose Verbose mode - show all debug information385 -q, --quiet Quiet mode - show only transcripts, errors, and warnings386 --log-level <level> Set log level (quiet|error|warn|info|debug|verbose) (default: info)387 --no-colors Disable colored output388 --no-timestamp Disable timestamps in logs389 --record [filename] Enable stereo call recording (optional filename)390 --brief <text> Call brief to generate instructions from (RECOMMENDED)391 --instructions <text> Direct AI instructions (use only for specific custom behavior)392 --user-name <name> Your name for the AI to use when calling393 --voice <name> Voice to use (default: auto) - see Voice Selection section394 --help Display help information395```396397### 3. Programmatic API398399```typescript400import { makeCall, createAgent } from 'callcenter.js';401402// Simple call with brief403const result = await makeCall({404 number: '+1234567890',405 brief: 'Call Bocca di Bacco and book a table for 2 at 19:30 for Torben',406 userName: 'Torben',407 config: 'config.json'408});409410console.log(`Call duration: ${result.duration}s`);411console.log(`Transcript: ${result.transcript}`);412413// Advanced usage with agent instance414const agent = await createAgent('config.json');415416agent.on('callEnded', () => {417 console.log('Call finished!');418});419420await agent.makeCall({421 targetNumber: '+1234567890',422 duration: 300423});424```425426## 📚 API Reference427428### `makeCall(options: CallOptions): Promise<CallResult>`429430Make a phone call with the AI agent.431432#### CallOptions433434```typescript435interface CallOptions {436 number: string; // Phone number to call437 duration?: number; // Call duration in seconds438 config?: string | Config; // Configuration file path or object439 instructions?: string; // Direct AI instructions (highest priority)440 brief?: string; // Call brief to generate instructions from441 userName?: string; // Your name for the AI to use442 recording?: boolean | string; // Enable recording with optional filename443 logLevel?: 'quiet' | 'error' | 'warn' | 'info' | 'debug' | 'verbose';444 colors?: boolean; // Enable colored output445 timestamps?: boolean; // Enable timestamps in logs446}447```448449#### CallResult450451```typescript452interface CallResult {453 callId?: string; // Call ID if successful454 duration: number; // Call duration in seconds455 transcript?: string; // Full conversation transcript456 success: boolean; // Whether call was successful457 error?: string; // Error message if failed458}459```460461### `createAgent(config, options?): Promise<VoiceAgent>`462463Create a VoiceAgent instance for advanced use cases.464465```typescript466const agent = await createAgent('config.json', {467 enableCallRecording: true,468 recordingFilename: 'call.wav'469});470471// Event handlers472agent.on('callInitiated', ({ callId, target }) => {473 console.log(`Call ${callId} started to ${target}`);474});475476agent.on('callEnded', () => {477 console.log('Call ended');478});479480agent.on('error', (error) => {481 console.error('Call error:', error.message);482});483```484485### Configuration Structure486487```typescript488interface Config {489 sip: {490 username: string;491 password: string;492 serverIp: string;493 serverPort?: number;494 provider?: string;495 stunServers?: string[];496 turnServers?: TurnServer[];497 };498 ai: {499 openaiApiKey: string;500 voice?: 'auto' | 'alloy' | 'ash' | 'ballad' | 'cedar' | 'coral' | 'echo' | 'marin' | 'sage' | 'shimmer' | 'verse';501 instructions?: string;502 brief?: string;503 userName?: string;504 };505 logging?: {506 level?: string;507 };508}509```510511### Environment Variables512513All configuration options can be set via environment variables (useful for npx usage):514515#### **Required Variables:**516```bash517SIP_USERNAME=your_extension518SIP_PASSWORD=your_password519SIP_SERVER_IP=192.168.1.1520OPENAI_API_KEY=sk-your-key-here521USER_NAME="Your Name" # Required when using --brief522```523524#### **Optional Variables:**525```bash526# SIP Configuration527SIP_SERVER_PORT=5060528SIP_LOCAL_PORT=5060529SIP_PROVIDER=fritz-box # fritz-box, asterisk, cisco, 3cx, generic530STUN_SERVERS="stun:stun.l.google.com:19302,stun:stun2.l.google.com:19302"531SIP_TRANSPORTS="udp,tcp"532533# OpenAI Configuration534OPENAI_VOICE=auto # auto (recommended), marin, cedar, alloy, echo, shimmer, coral, sage, ash, ballad, verse535OPENAI_INSTRUCTIONS="Your custom AI instructions"536537# Advanced SIP Features538SESSION_TIMERS_ENABLED=true539SESSION_EXPIRES=1800540SESSION_MIN_SE=90541SESSION_REFRESHER=uac542```543544**Priority order:** CLI flags > Config file > Environment variables545546## ✅ Quick Success Check547548Before making real calls, validate your setup with these safe tests:549550### 1. Configuration Validation551```bash552# Basic validation - checks syntax and required fields553npm run validate config.json554555# Detailed validation with network connectivity tests556npm run validate:detailed557558# Get specific fix suggestions for issues559npm run validate:fix560```561562### 2. Test Call to Yourself (Fritz!Box users)563```bash564# Call your own extension to verify audio quality (safe test)565npm start call "**620" --brief "Test call to check audio quality" --user-name "Your Name" --duration 30566567# Or use your mobile number for end-to-end test568npm start call "+49123456789" --brief "Quick test call" --user-name "Your Name" --duration 15569```570571### 3. What to Expect572- ✅ **Working setup**: Clear audio, proper AI responses, clean call termination573- ⚠️ **Network issues**: "Connection failed" errors → check firewall/STUN settings574- ⚠️ **Auth problems**: "401 Unauthorized" → verify SIP credentials575- ⚠️ **Codec issues**: Poor audio quality → G.722 compilation may have failed576577> **Pro tip**: Start with `--duration 30` for test calls to avoid long waits if something goes wrong.578579## 📋 Configuration Validation580581The built-in validation system provides comprehensive analysis:582583```bash584# Basic validation585npm run validate config.json586587# Detailed validation with network connectivity tests588npm run validate:detailed589590# Get specific fix suggestions for issues591npm run validate:fix592593# Test example configurations for different providers594npm run validate:fritz-box # AVM Fritz!Box595npm run validate:asterisk # Asterisk PBX596npm run validate:cisco # Cisco CUCM597npm run validate:3cx # 3CX Phone System598npm run validate:generic # Generic SIP provider599```600601The validator will check:602- ✅ **Configuration syntax and required fields**603- ✅ **Provider-specific requirements**604- ✅ **Network connectivity to SIP server**605- ✅ **STUN server reachability**606- ✅ **Codec availability (G.722/G.711)**607- ✅ **Provider compatibility score**608609## 🌐 SIP Provider Compatibility610611### ✅ **Actually Tested**612613- **AVM Fritz!Box** - German router brand with built-in VoIP/SIP phone system ✅ **WORKS** (only one actually tested)614615### 🤷 **Vibe-coded Configs** (Educated Guesses)616617- **Asterisk PBX** - Open source PBX (FreePBX, Elastix, etc.) 🤷 **UNTESTED**618- **Cisco CUCM** - Enterprise Unified Communications 🤷 **UNTESTED**619- **3CX Phone System** - Popular business PBX 🤷 **UNTESTED**620- **Generic SIP Providers** - Standards-compliant SIP trunks 🤷 **UNTESTED**621622### 🔧 **Provider-Specific Features**623624The provider profiles are based on research and documentation, not actual testing:625626| Provider | Transport | NAT Traversal | Session Timers | PRACK | Keepalive |627|----------|-----------|---------------|----------------|--------|----------|628| **Fritz Box** | UDP | Not needed | Optional | Disabled | Re-register |629| **Asterisk** | UDP/TCP | STUN | Supported | Optional | OPTIONS ping |630| **Cisco CUCM** | TCP preferred | STUN required | Required | Required | OPTIONS ping |631| **3CX** | TCP/UDP | STUN | Supported | Optional | Re-register |632633### Configuration Decision Tree634635```mermaid636flowchart TD637 A[Choose Your SIP Provider] --> B{Fritz!Box Router?}638 B -->|Yes| C[✅ Use fritz-box profile<br/>UDP transport<br/>No STUN needed]639 B -->|No| D{Enterprise System?}640641 D -->|Cisco CUCM| E[⚠️ Use cisco profile<br/>TCP transport<br/>STUN required<br/>Session timers + PRACK]642 D -->|3CX| F[⚠️ Use 3cx profile<br/>TCP/UDP transport<br/>STUN recommended]643 D -->|Asterisk/FreePBX| G[⚠️ Use asterisk profile<br/>UDP/TCP transport<br/>STUN for NAT]644 D -->|Other| H[⚠️ Use generic profile<br/>Start with UDP<br/>Add STUN if needed]645646 C --> I[Configure Basic Settings]647 E --> J[Configure Enterprise Settings]648 F --> J649 G --> J650 H --> J651652 I --> K[Set SIP credentials<br/>serverIp = router IP<br/>typically 192.168.1.1]653 J --> L[Set SIP credentials<br/>serverIp = server IP<br/>Add STUN servers]654655 K --> M{Network Location?}656 L --> M657658 M -->|Local Network| N[✅ Basic setup complete<br/>Should work reliably]659 M -->|Cloud/Remote| O[❓ May need additional<br/>STUN/TURN configuration]660661 style C fill:#c8e6c9662 style E fill:#ffecb3663 style F fill:#ffecb3664 style G fill:#ffecb3665 style H fill:#ffecb3666 style N fill:#c8e6c9667 style O fill:#ffe0b2668```669670### 📝 **Configuration Examples**671672The project includes ready-to-use configurations for all major providers:673674- `config.example.json` - **AVM Fritz!Box** (home/SMB default)675- `config.asterisk.example.json` - **Asterisk PBX** with advanced features676- `config.cisco.example.json` - **Cisco CUCM** enterprise setup677- `config.3cx.example.json` - **3CX Phone System** configuration678- `config.generic.example.json` - **Generic SIP provider** template679680## 🎵 Audio Quality & Codecs681682### Codec Priority & Negotiation6836841. **G.722** (Preferred) - 16kHz wideband, superior voice quality6852. **G.711 μ-law** (Fallback) - 8kHz narrowband, universal compatibility6863. **G.711 A-law** (Fallback) - 8kHz narrowband, European standard687688### G.722 Implementation689690- **Prebuilt WebAssembly codec** bundled with the package so every install gets wideband audio out of the box691- **Native C++ addon** still available for optimal performance when you opt-in with `npm run build:native`692- **Based on reference implementations** from CMU and Sippy Software693- **Automatic fallback** to G.711 if codec loading fails for any reason694- **Real-time encoding/decoding** with low latency695696### Optional Call Recording697698- **Stereo WAV format** with caller on left channel, AI on right channel699- **Optional filename** specification700- **Synchronized audio streams** for perfect alignment701- **High-quality PCM recording** at native sample rates702703### Testing Audio Quality704705```bash706# Test codec availability707npm run test:codecs708709# Rebuild all codec artifacts (native + WASM + TS) if you changed the C sources710npm run build:all711712# Disable G.722 entirely if you only want the G.711 fallback713npm run build:no-g722714```715716## 🤖 AI Call Brief Processing717718### Why This Matters: Real-Time Voice API Needs Better Instructions719720OpenAI's Real-Time Voice API is **optimized for speed, not sophistication**. It's great at natural conversation but struggles with complex, goal-oriented tasks without very specific instructions. Here's the problem:721722**❌ What doesn't work well:**723```bash724# Vague brief - Real-Time Voice API will be confused and unfocused725npm start call "+1234567890" --brief "Call the restaurant and book a table"726```727728**❌ What's tedious and error-prone:**729```bash730# Writing detailed instructions manually every time731npm start call "+1234567890" --instructions "You are calling on behalf of John Doe to make a restaurant reservation for 2 people at Bocca di Bacco for tonight at 7pm. You should start by greeting them professionally, then clearly state your purpose. Ask about availability for 7pm, and if not available, ask for alternative times between 6-8pm. Confirm the booking details including date, time, party size, and get a confirmation number if possible. If you reach voicemail, leave a professional message with callback information..."732```733734**✅ What works brilliantly:**735```bash736# Simple brief - o3 model generates sophisticated instructions737npm start call "+1234567890" --brief "Call Bocca di Bacco and book a table for 2 at 7pm tonight" --user-name "John Doe"738```739740### How It Works741742The system uses OpenAI's **o3-mini reasoning model** (their latest small reasoning model - smart but fast) to automatically generate detailed, sophisticated instructions from your simple brief. The o3-mini model:7437441. **Analyzes your brief** and understands the goal7452. **Creates conversation states** and flow logic7463. **Generates specific instructions** for each phase of the call7474. **Handles edge cases** like voicemail, objections, and alternatives7485. **Adapts language and tone** based on context7496. **Provides fallback strategies** when things don't go as planned750751### Call Flow Sequence752753```mermaid754sequenceDiagram755 participant U as User/Claude756 participant V as VoiceAgent757 participant B as Brief Processor<br/>(o3-mini)758 participant S as SIP Client759 participant A as Audio Bridge760 participant O as OpenAI Realtime<br/>(gpt-realtime)761 participant P as Phone/VoIP762763 U->>V: makeCall({brief, number, userName})764 V->>B: Process brief with o3-mini765 B->>B: Generate detailed instructions<br/>& conversation states766 B->>V: Sophisticated call instructions767768 V->>S: Connect to SIP server769 S->>P: INVITE (start call)770 P->>S: 200 OK (call answered)771 S->>V: Call established772773 V->>A: Initialize audio bridge774 V->>O: Connect to OpenAI Realtime775 O->>V: WebSocket connected776 V->>O: Send generated instructions777778 loop During Call779 P->>A: RTP audio packets780 A->>A: Decode G.722/G.711 → PCM781 A->>O: Stream PCM audio782 O->>O: Process speech → text783 O->>O: Generate AI response784 O->>A: Stream AI audio (PCM)785 A->>A: Encode PCM → G.722/G.711786 A->>P: RTP audio packets787788 Note over V: Monitor call progress<br/>& transcript logging789 end790791 alt Call completed naturally792 O->>V: Call completion signal793 V->>S: Send BYE794 else Duration limit reached795 V->>V: Safety timeout triggered796 V->>S: Send BYE797 end798799 S->>P: BYE (end call)800 P->>S: 200 OK801 V->>U: CallResult{transcript, duration, success}802```803804### Before/After Example805806**Your simple input:**807```808"Call Bocca di Bacco and book a table for 2 at 7pm tonight"809```810811**What o3-mini generates (excerpt):**812```813## Personality and Tone814Identity: I am an assistant calling on behalf of John Doe to make a restaurant reservation.815Task: I am responsible for booking a table for 2 people at Bocca di Bacco today at 7:00 PM.816Tone: Professional, warm, and respectful.817818## Instructions8191. Open the conversation immediately: "Hello, this is an assistant calling on behalf of John Doe."8202. Read back critical data: Repeat times and details for confirmation.8213. Handle objections: Respond politely and offer alternatives between 6-8 PM.822...823824## Conversation States825[826 {827 "id": "1_greeting",828 "description": "Greeting and introduction of call purpose",829 "instructions": ["Introduce yourself as an assistant", "Immediately mention the reservation request"],830 "examples": ["Hello, this is an assistant calling on behalf of John Doe. I'm calling to book a table for 2 people today at 7:00 PM."]831 }832]833```834835### Automatic Adaptations836837The o3-mini brief processor automatically:838- **Detects language** from your brief and generates instructions in that language839- **Creates conversation flow** with logical states and transitions840- **Handles cultural context** (German restaurants vs. American vs. Japanese)841- **Generates appropriate examples** with real phrases (no placeholders)842- **Provides voicemail scripts** for when nobody answers843- **Plans for objections** and alternative solutions844845### When to Use Each Approach846847- **Use `--brief`** for 95% of calls - it's easier and produces better results848- **Use `--instructions`** only when you need very specific, custom behavior849- **Brief processing** is perfect for: reservations, appointments, business calls, customer service850- **Direct instructions** are better for: highly specialized scenarios, testing, or when you've already perfected your prompt851852## 🎤 Voice Selection853854The AI agent supports 10 different voices from OpenAI's Realtime API, each with unique characteristics. By default, the system uses **auto mode** where o3-mini intelligently selects the optimal voice based on your call's context.855856### Available Voices857858| Voice | Gender | Description | Best For |859|-------|--------|-------------|----------|860| **marin** | Female | Clear, professional feminine voice | All-purpose: business calls, customer support, negotiations |861| **cedar** | Male | Natural masculine voice with warm undertones | All-purpose: professional calls, consultations, service interactions |862| **alloy** | Neutral | Professional voice with good adaptability | Technical discussions, business contexts, general inquiries |863| **echo** | Male | Conversational masculine voice | Casual to formal interactions, versatile tone |864| **shimmer** | Female | Warm, expressive feminine voice | Empathetic conversations, sales, professional contexts |865| **coral** | Female | Warm and friendly feminine voice | Customer interactions, consultations, support calls |866| **sage** | Neutral | Calm and thoughtful voice | Medical consultations, advisory roles, serious discussions |867| **ash** | Neutral | Clear and precise voice | Technical explanations, instructions, educational content |868| **ballad** | Female | Melodic and smooth feminine voice | Presentations, storytelling, engaging conversations |869| **verse** | Neutral | Versatile and expressive voice | Dynamic conversations, adaptable to any context |870871### Auto Voice Selection (Recommended)872873The **auto mode** (default) uses o3-mini to analyze your call context and select the most appropriate voice:874875```bash876# Auto mode - AI selects the best voice877npm start call "+1234567890" --brief "Call doctor's office to schedule appointment" --user-name "John"878# Might select: sage (calm, professional for healthcare)879880# Auto mode adapts to context881npm start call "+1234567890" --brief "Call pizza place to order delivery" --user-name "Sarah"882# Might select: coral or echo (friendly, casual for food service)883```884885### Manual Voice Selection886887You can override auto selection when you have specific requirements:888889```bash890# Use a specific voice891npm start call "+1234567890" --voice marin --brief "Call to book reservation" --user-name "Alex"892893# Professional contexts894npm start call "+1234567890" --voice cedar --brief "Call bank about account" --user-name "Pat"895896# Friendly service calls897npm start call "+1234567890" --voice coral --brief "Call flower shop for delivery" --user-name "Sam"898```899900### Configuration Options901902Set default voice in your config file or environment:903904```json905// config.json906{907 "ai": {908 "voice": "auto", // or specific voice like "marin", "cedar", etc.909 // ...910 }911}912```913914```bash915# Environment variable916export OPENAI_VOICE=auto # or marin, cedar, alloy, etc.917```918919### Voice Selection Guidelines920921The auto mode considers these factors:922923- **Formality Level**: High (cedar, marin, sage) → Medium (alloy, verse) → Low (echo, coral, shimmer)924- **Industry Context**: Healthcare (sage, shimmer), Finance (cedar, sage), Retail (coral, echo), Tech (alloy, ash)925- **Goal Type**: Authority needed (cedar, sage), Friendliness (coral, shimmer), Efficiency (marin, alloy)926- **Language**: Voices adapt to detected language from your call brief927928### MCP Integration929930The MCP tools strongly recommend auto mode but support manual override:931932```typescript933// Simple call - auto voice selection934mcp__callcenter_js__simple_call({935 phone_number: "+1234567890",936 brief: "Call restaurant for reservation",937 caller_name: "John",938 voice: "auto" // Optional, defaults to auto939})940941// Advanced call - manual voice selection942mcp__callcenter_js__advanced_call({943 phone_number: "+1234567890",944 goal: "Schedule medical appointment",945 user_name: "Jane",946 voice: "sage" // Override for specific voice947})948```949950## 🔄 Advanced Features951952### Smart Connection Management953954- **Automatic Reconnection**: Exponential backoff with provider-specific error handling955- **Transport Fallback**: UDP → TCP → TLS based on what works956- **Provider-Aware Error Recovery**: Different strategies for Fritz Box vs. Asterisk vs. Cisco957- **Network Change Handling**: Adapts to network connectivity changes958959### Enhanced SIP Protocol Support960961- **STUN/TURN Integration**: NAT traversal for cloud and enterprise deployments962- **Session Timers (RFC 4028)**: Connection stability for long calls963- **PRACK Support (RFC 3262)**: Reliable provisional responses for enterprise systems964- **Multiple Transports**: UDP, TCP, TLS with intelligent fallback965966### Configuration Intelligence967968- **Provider Auto-Detection**: Identifies provider from SIP domain/IP969- **Requirements Validation**: Ensures all provider-specific needs are met970- **Network Testing**: Real connectivity tests to SIP servers and STUN servers971- **Optimization Suggestions**: Actionable recommendations for better performance972973## 🛠️ Development & Testing974975### Build Commands976977```bash978# Default build (WASM refresh + TypeScript, skips if artifacts already exist)979npm run build980981# Build components separately (useful for maintainers)982npm run build:wasm # Regenerate the G.722 WebAssembly codec983npm run build:native # Rebuild the native addon (requires toolchain)984npm run build:all # Run native + WASM + TypeScript in one go985npm run build:ts # TypeScript compilation only986987# Development with hot reload988npm run dev989990# Clean all build artifacts991npm run clean992```993994### Configuration Testing995996```bash997# Validate any config file998npm run validate path/to/config.json9991000# Test with different providers1001npm run validate -- --provider asterisk config.json10021003# Get detailed network diagnostics1004npm run validate -- --detailed --network config.json10051006# Show fix suggestions for issues1007npm run validate -- --fix-suggestions config.json1008```10091010### Project Structure10111012```1013src/1014├── voice-agent.ts # Main orchestration with ConnectionManager1015├── connection-manager.ts # Smart connection handling & reconnection1016├── sip-client.ts # Enhanced SIP protocol with provider support1017├── audio-bridge.ts # RTP streaming and codec management1018├── openai-client.ts # OpenAI Real-Time Voice API integration1019├── call-brief-processor.ts # o3-mini model call brief processing1020├── mcp-server.ts # MCP (Model Context Protocol) server1021├── validation.ts # Configuration validation engine1022├── config.ts # Enhanced config loading with provider profiles1023├── logger.ts # Comprehensive logging with transcript capture1024├── index.ts # Main programmatic API exports1025├── providers/1026│ └── profiles.ts # Provider-specific configuration database1027├── testing/1028│ └── network-tester.ts # Real network connectivity testing1029├── codecs/ # Codec abstraction layer1030│ ├── g722.ts # G.722 wideband implementation1031│ └── g711.ts # G.711 fallback codecs1032└── cli.ts # Command-line interface10331034scripts/1035└── validate-config.js # Comprehensive validation CLI tool10361037config.*.example.json # Provider-specific example configurations1038```10391040## 📊 Validation & Diagnostics10411042The built-in validation system provides comprehensive analysis:10431044### Configuration Report Example10451046```1047🔍 CallCenter.js Configuration Validator10481049📋 Provider: AVM Fritz!Box (auto-detected)1050🎯 Provider Compatibility Score: 100%10511052✅ Configuration is valid and ready for use!10531054🌐 Network Connectivity:1055 ✅ SIP Server: Reachable (12ms latency)1056 ✅ G.722 codec: Available for high-quality audio10571058💡 Optimization Suggestions:1059 💡 G.722 wideband codec available (already enabled)1060 💡 Excellent latency - local network performance optimal10611062🚀 Next steps: npm start call "<number>"1063```10641065### Network Diagnostics10661067- **Real SIP Server Testing**: Actual UDP/TCP connectivity tests1068- **STUN Server Validation**: Tests NAT traversal capability1069- **Latency Measurement**: Network performance assessment1070- **Provider-Specific Recommendations**: Tailored advice based on detected issues10711072## 🔧 Troubleshooting10731074### Configuration Issues107510761. **Run validation first**:1077 ```bash1078 npm run validate:detailed1079 ```108010812. **Check provider compatibility**:1082 ```bash1083 npm run validate -- --provider fritz-box config.json1084 ```108510863. **Get specific fix suggestions**:1087 ```bash1088 npm run validate:fix1089 ```10901091### Network Connectivity10921093- **Fritz Box**: Usually works with UDP on local network1094- **Cloud/Enterprise**: May need STUN servers for NAT traversal1095- **Firewall Issues**: Ensure SIP port (5060) and RTP ports are open10961097### Audio Quality109810991. **Verify G.722 is available**:1100 ```bash1101 npm run test:codecs1102 ```110311042. **Check codec negotiation in logs**:1105 ```1106 ✅ Selected codec: PT 9 (G722/8000)1107 ```110811093. **Network issues**: High latency/packet loss affects audio quality11101111### Build Problems111211131. **Native compilation fails** (only happens if you explicitly ran `npm run build:native` or `npm run build:all`): stick with the bundled WASM codec unless you specifically need native performance. To drop back to G.711 entirely, run:1114 ```bash1115 npm run build:no-g7221116 ```111711182. **Provider-specific issues**: Check validation recommendations for your provider11191120### MCP Integration Issues112111221. **Server won't start**:1123 ```bash1124 # Check for port conflicts or config issues1125 npm start --mcp1126 ```112711282. **Claude Code not connecting**:1129 - Verify MCP server configuration in Claude Code settings1130 - Check that the working directory path is correct1131 - Ensure the server is running and accessible11321133## 📈 What I Built11341135This is a personal project that includes:11361137- **🌐 Fritz!Box Support**: Actually tested and works1138- **🤷 Other SIP Configs**: Vibe-coded based on documentation reading1139- **🔄 Connection Handling**: Seems to work, has retry logic1140- **✅ Config Validation**: Catches obvious mistakes1141- **📊 Network Testing**: Basic connectivity checks1142- **🎯 Provider Profiles**: Research-based guesses about different systems1143- **🔌 MCP Server**: Works with Claude Code (tested)1144- **📚 TypeScript API**: Clean interfaces for programmatic use1145- **📝 Call Brief Processing**: Uses o3-mini to generate instructions (works well)1146- **🎵 Optional Call Recording**: Stereo WAV files with left/right channels1147- **📋 Transcript Capture**: Real-time conversation logs11481149## ⚠️ Important Disclaimer11501151**This project is vibe-coded!** 🚀11521153This means:1154- ✅ **Works on Fritz!Box** - that's what I actually tested1155- 🤷 **Other providers** - I tried to make it more useful but can't promise anything1156- 🤷 **Advanced features** - seemed like good ideas based on research, but who knows1157- ⚠️ **YMMV** - your setup is probably different than mine1158- ⚠️ **No warranties** - use at your own risk11591160### What This Means for You11611162- **Fritz Box users**: Should work great! ✅1163- **Other providers**: The configuration profiles are educated guesses based on research - they might work, they might not1164- **Enterprise users**: I tried to add the features that seemed important, but I have no idea if they actually work correctly1165- **Issues & PRs**: I'll accept pull requests, but I can't promise to fix issues I can't reproduce or test11661167### If You Want to Contribute11681169- ✅ **Test it on your setup** and let others know what works1170- ✅ **Share working configs** if you get something else working1171- ✅ **Fix stuff that's broken** and submit PRs1172- ✅ **Tell me if my assumptions were wrong** about how providers work11731174The validation tools might help debug issues, but honestly, the real test is whether you can make actual calls.11751176## 📜 License11771178MIT License - see [LICENSE](LICENSE) for details.11791180### Third-Party Components11811182- **G.722 Codec**: Public domain and BSD licensed implementations1183- **SIP Protocol**: Based on sipjs-udp (MIT licensed)1184- **Dependencies**: Various open source licenses (see package.json)11851186## 🤝 Contributing118711881. Fork the repository11892. Create a feature branch11903. Add/update validation for new providers11914. Test with `npm run validate:detailed`11925. Submit a pull request11931194## 📞 Support11951196- **Configuration Issues**: Use `npm run validate:detailed` for diagnostics1197- **Provider Support**: Check compatibility matrix above1198- **Build Problems**: See troubleshooting section1199- **Feature Requests**: You can open GitHub issues, but they're unlikely to get attention anytime soon. Pull requests are much preferred!12001201---12021203**Ready to get started?** Copy an example config, run `npm run validate:detailed`, and start making AI-powered voice calls! 🚀1204
Full transparency — inspect the skill content before installing.