An MCP Server, CLI tool, and API that makes phone calls on your behalf using VoIP. Just tell Claude what you want to accomplish, and it will call and handle the conversation for you. This is essentially an MCP Server that bridges between OpenAI's Real-Time Voice API and your VoIP connection to call people on your behalf. You: "Can you call Tony's Pizza and order a large pepperoni pizza for deliver
Add this skill
npx mdskills install gerkensm/callcenter-js-mcpWell-documented VoIP calling server with OpenAI integration, clear examples, but over-scoped permissions
An MCP Server, CLI tool, and API that makes phone calls on your behalf using VoIP.
Just tell Claude what you want to accomplish, and it will call and handle the conversation for you. This is essentially an MCP Server that bridges between OpenAI's Real-Time Voice API and your VoIP connection to call people on your behalf.
⚠️ Vibe-coded side project! Please do not use this in any kind of professional context. This is a side project coded in a weekend. There are no guard rails. Your MCP client can call any number with this, even if you don't ask it to. In fact, it has done so during testing - it called a random number during the night "for testing" and played back scary low-pitched noises - then claimed it called MY number. So YMMV, no warranties. See disclaimer below.
You: "Can you call Tony's Pizza and order a large pepperoni pizza for delivery to 123 Main St? My name is John and my number is 555-0123."
Claude automatically calls the restaurant:
⏺ mcp__callcenter_js__simple_call(phone_number: "+15551234567",
brief: "Call Tony's Pizza and order a large pepperoni pizza for delivery to 123 Main St. Customer name is John, phone number 555-0123",
caller_name: "John")
⎿ # Simple Call Result
**Status:** ✅ Success
**Duration:** 3 minutes 24 seconds
**Call ID:** abc123xyz
## Call Transcript
[14:23:15] 🎤 HUMAN: Tony's Pizza, how can I help you?
[14:23:15] 🤖 ASSISTANT: Hi! I'm calling on behalf of John to place a delivery order.
[14:23:20] 🎤 HUMAN: Sure! What would you like?
[14:23:20] 🤖 ASSISTANT: I'd like to order one large pepperoni pizza for delivery please.
[14:23:25] 🎤 HUMAN: Large pepperoni, got it. What's the delivery address?
[14:23:25] 🤖 ASSISTANT: The address is 123 Main Street.
[14:23:30] 🎤 HUMAN: And your phone number?
[14:23:30] 🤖 ASSISTANT: The phone number is 555-0123.
[14:23:35] 🎤 HUMAN: Perfect! That'll be $18.99. We'll have it delivered in about 30 minutes.
[14:23:40] 🤖 ASSISTANT: That sounds great! Thank you so much.
[14:23:42] 🎤 HUMAN: You're welcome! Have a great day.
Pizza ordered successfully! 🍕
VoIP (Voice over IP) is how you make phone calls over the internet instead of traditional phone lines. SIP (Session Initiation Protocol) is the language these systems speak to connect calls. Think of it as HTTP but for phone calls.
Fritz!Box is a popular German router/modem that happens to have a built-in phone system (PBX). If you have one, you already have everything you need to make VoIP calls - this tool just connects to it. Outside Germany, you might know similar devices from other brands, or use dedicated VoIP services like Asterisk, 3CX, or cloud providers.
MCP (Model Context Protocol) is Anthropic's standard for connecting AI assistants like Claude to external tools and services. It's what lets MCP clients actually do things instead of just talking about them.
Built as a bridge between OpenAI's Real-Time Voice API and VoIP networks, with multiple codec support (G.722, G.711), and expanded SIP protocol support for broad VoIP compatibility. Compatible with the latest gpt-realtime model released August 28, 2025.
graph TB
subgraph "User Interface"
A[Claude Code/MCP Client]
B[CLI Tool]
C[TypeScript API]
end
subgraph "CallCenter.js Core"
D[MCP Server]
E[VoiceAgent]
F[Call Brief Processor
o3-mini model]
end
subgraph "Communication Layer"
G[SIP Client
Provider Support]
H[Audio Bridge
RTP Streaming]
end
subgraph "Audio Processing"
I[G.722 Codec
16kHz Wideband]
J[G.711 Codec
8kHz Fallback]
end
subgraph "External Services"
K[OpenAI Real-Time API
gpt-realtime model]
L[VoIP Network
Fritz!Box/Asterisk/etc]
end
A --> D
B --> E
C --> E
D --> E
E --> F
E --> G
E --> H
F --> K
G --> L
H --> I
H --> J
H --> K
style F fill:#e1f5fe
style K fill:#fff3e0
style L fill:#f3e5f5
⚠️ Vibe-coded project! Developed and tested on Fritz!Box (a German router with built-in VoIP) only. Other provider configs are research-based but untested. YMMV, no warranties. See disclaimer below.
Perfect for when your coding agent needs to call library authors to complain about their documentation! 😄
# Add to Claude Code with one command:
claude mcp add --env SIP_USERNAME=your_actual_extension \
--env SIP_PASSWORD="your_actual_password" \
--env SIP_SERVER_IP=192.168.1.1 \
--env OPENAI_API_KEY="sk-your_actual_openai_key" \
--env USER_NAME="Your Actual Name" \
-- callcenter.js npx -- github:gerkensm/callcenter.js-mcp --mcp
Then just ask your MCP Client to make calls:
"Can you call the pizza place and order a large pepperoni? My number is 555-0123."
Your MCP Client will automatically handle the entire conversation using the AI Voice Agent! 🤖📞
npx users get wideband audio without installing build tools (native addon still available for max performance)gpt-realtime model (released August 28, 2025) for actual calls, with o3-mini model for instruction generationFastest way to try it out:
# Set environment variables (or create .env file)
export SIP_USERNAME="your_extension"
export SIP_PASSWORD="your_password"
export SIP_SERVER_IP="192.168.1.1"
export OPENAI_API_KEY="sk-your-key-here"
# Run directly from GitHub (no installation needed!)
npx github:gerkensm/callcenter.js-mcp call "+1234567890" --brief "Call restaurant for reservation" --user-name "Your Name"
Or using a .env file:
# Create .env file
cat > .env [options]
Options:
-c, --config Configuration file path (default: config.json)
-d, --duration Maximum call duration in seconds (default: 600)
-v, --verbose Verbose mode - show all debug information
-q, --quiet Quiet mode - show only transcripts, errors, and warnings
--log-level Set log level (quiet|error|warn|info|debug|verbose) (default: info)
--no-colors Disable colored output
--no-timestamp Disable timestamps in logs
--record [filename] Enable stereo call recording (optional filename)
--brief Call brief to generate instructions from (RECOMMENDED)
--instructions Direct AI instructions (use only for specific custom behavior)
--user-name Your name for the AI to use when calling
--voice Voice to use (default: auto) - see Voice Selection section
--help Display help information
import { makeCall, createAgent } from 'callcenter.js';
// Simple call with brief
const result = await makeCall({
number: '+1234567890',
brief: 'Call Bocca di Bacco and book a table for 2 at 19:30 for Torben',
userName: 'Torben',
config: 'config.json'
});
console.log(`Call duration: ${result.duration}s`);
console.log(`Transcript: ${result.transcript}`);
// Advanced usage with agent instance
const agent = await createAgent('config.json');
agent.on('callEnded', () => {
console.log('Call finished!');
});
await agent.makeCall({
targetNumber: '+1234567890',
duration: 300
});
makeCall(options: CallOptions): PromiseMake a phone call with the AI agent.
interface CallOptions {
number: string; // Phone number to call
duration?: number; // Call duration in seconds
config?: string | Config; // Configuration file path or object
instructions?: string; // Direct AI instructions (highest priority)
brief?: string; // Call brief to generate instructions from
userName?: string; // Your name for the AI to use
recording?: boolean | string; // Enable recording with optional filename
logLevel?: 'quiet' | 'error' | 'warn' | 'info' | 'debug' | 'verbose';
colors?: boolean; // Enable colored output
timestamps?: boolean; // Enable timestamps in logs
}
interface CallResult {
callId?: string; // Call ID if successful
duration: number; // Call duration in seconds
transcript?: string; // Full conversation transcript
success: boolean; // Whether call was successful
error?: string; // Error message if failed
}
createAgent(config, options?): PromiseCreate a VoiceAgent instance for advanced use cases.
const agent = await createAgent('config.json', {
enableCallRecording: true,
recordingFilename: 'call.wav'
});
// Event handlers
agent.on('callInitiated', ({ callId, target }) => {
console.log(`Call ${callId} started to ${target}`);
});
agent.on('callEnded', () => {
console.log('Call ended');
});
agent.on('error', (error) => {
console.error('Call error:', error.message);
});
interface Config {
sip: {
username: string;
password: string;
serverIp: string;
serverPort?: number;
provider?: string;
stunServers?: string[];
turnServers?: TurnServer[];
};
ai: {
openaiApiKey: string;
voice?: 'auto' | 'alloy' | 'ash' | 'ballad' | 'cedar' | 'coral' | 'echo' | 'marin' | 'sage' | 'shimmer' | 'verse';
instructions?: string;
brief?: string;
userName?: string;
};
logging?: {
level?: string;
};
}
All configuration options can be set via environment variables (useful for npx usage):
SIP_USERNAME=your_extension
SIP_PASSWORD=your_password
SIP_SERVER_IP=192.168.1.1
OPENAI_API_KEY=sk-your-key-here
USER_NAME="Your Name" # Required when using --brief
# SIP Configuration
SIP_SERVER_PORT=5060
SIP_LOCAL_PORT=5060
SIP_PROVIDER=fritz-box # fritz-box, asterisk, cisco, 3cx, generic
STUN_SERVERS="stun:stun.l.google.com:19302,stun:stun2.l.google.com:19302"
SIP_TRANSPORTS="udp,tcp"
# OpenAI Configuration
OPENAI_VOICE=auto # auto (recommended), marin, cedar, alloy, echo, shimmer, coral, sage, ash, ballad, verse
OPENAI_INSTRUCTIONS="Your custom AI instructions"
# Advanced SIP Features
SESSION_TIMERS_ENABLED=true
SESSION_EXPIRES=1800
SESSION_MIN_SE=90
SESSION_REFRESHER=uac
Priority order: CLI flags > Config file > Environment variables
Before making real calls, validate your setup with these safe tests:
# Basic validation - checks syntax and required fields
npm run validate config.json
# Detailed validation with network connectivity tests
npm run validate:detailed
# Get specific fix suggestions for issues
npm run validate:fix
# Call your own extension to verify audio quality (safe test)
npm start call "**620" --brief "Test call to check audio quality" --user-name "Your Name" --duration 30
# Or use your mobile number for end-to-end test
npm start call "+49123456789" --brief "Quick test call" --user-name "Your Name" --duration 15
Pro tip: Start with
--duration 30for test calls to avoid long waits if something goes wrong.
The built-in validation system provides comprehensive analysis:
# Basic validation
npm run validate config.json
# Detailed validation with network connectivity tests
npm run validate:detailed
# Get specific fix suggestions for issues
npm run validate:fix
# Test example configurations for different providers
npm run validate:fritz-box # AVM Fritz!Box
npm run validate:asterisk # Asterisk PBX
npm run validate:cisco # Cisco CUCM
npm run validate:3cx # 3CX Phone System
npm run validate:generic # Generic SIP provider
The validator will check:
The provider profiles are based on research and documentation, not actual testing:
| Provider | Transport | NAT Traversal | Session Timers | PRACK | Keepalive |
|---|---|---|---|---|---|
| Fritz Box | UDP | Not needed | Optional | Disabled | Re-register |
| Asterisk | UDP/TCP | STUN | Supported | Optional | OPTIONS ping |
| Cisco CUCM | TCP preferred | STUN required | Required | Required | OPTIONS ping |
| 3CX | TCP/UDP | STUN | Supported | Optional | Re-register |
flowchart TD
A[Choose Your SIP Provider] --> B{Fritz!Box Router?}
B -->|Yes| C[✅ Use fritz-box profile
UDP transport
No STUN needed]
B -->|No| D{Enterprise System?}
D -->|Cisco CUCM| E[⚠️ Use cisco profile
TCP transport
STUN required
Session timers + PRACK]
D -->|3CX| F[⚠️ Use 3cx profile
TCP/UDP transport
STUN recommended]
D -->|Asterisk/FreePBX| G[⚠️ Use asterisk profile
UDP/TCP transport
STUN for NAT]
D -->|Other| H[⚠️ Use generic profile
Start with UDP
Add STUN if needed]
C --> I[Configure Basic Settings]
E --> J[Configure Enterprise Settings]
F --> J
G --> J
H --> J
I --> K[Set SIP credentials
serverIp = router IP
typically 192.168.1.1]
J --> L[Set SIP credentials
serverIp = server IP
Add STUN servers]
K --> M{Network Location?}
L --> M
M -->|Local Network| N[✅ Basic setup complete
Should work reliably]
M -->|Cloud/Remote| O[❓ May need additional
STUN/TURN configuration]
style C fill:#c8e6c9
style E fill:#ffecb3
style F fill:#ffecb3
style G fill:#ffecb3
style H fill:#ffecb3
style N fill:#c8e6c9
style O fill:#ffe0b2
The project includes ready-to-use configurations for all major providers:
config.example.json - AVM Fritz!Box (home/SMB default)config.asterisk.example.json - Asterisk PBX with advanced featuresconfig.cisco.example.json - Cisco CUCM enterprise setupconfig.3cx.example.json - 3CX Phone System configurationconfig.generic.example.json - Generic SIP provider templatenpm run build:native# Test codec availability
npm run test:codecs
# Rebuild all codec artifacts (native + WASM + TS) if you changed the C sources
npm run build:all
# Disable G.722 entirely if you only want the G.711 fallback
npm run build:no-g722
OpenAI's Real-Time Voice API is optimized for speed, not sophistication. It's great at natural conversation but struggles with complex, goal-oriented tasks without very specific instructions. Here's the problem:
❌ What doesn't work well:
# Vague brief - Real-Time Voice API will be confused and unfocused
npm start call "+1234567890" --brief "Call the restaurant and book a table"
❌ What's tedious and error-prone:
# Writing detailed instructions manually every time
npm start call "+1234567890" --instructions "You are calling on behalf of John Doe to make a restaurant reservation for 2 people at Bocca di Bacco for tonight at 7pm. You should start by greeting them professionally, then clearly state your purpose. Ask about availability for 7pm, and if not available, ask for alternative times between 6-8pm. Confirm the booking details including date, time, party size, and get a confirmation number if possible. If you reach voicemail, leave a professional message with callback information..."
✅ What works brilliantly:
# Simple brief - o3 model generates sophisticated instructions
npm start call "+1234567890" --brief "Call Bocca di Bacco and book a table for 2 at 7pm tonight" --user-name "John Doe"
The system uses OpenAI's o3-mini reasoning model (their latest small reasoning model - smart but fast) to automatically generate detailed, sophisticated instructions from your simple brief. The o3-mini model:
sequenceDiagram
participant U as User/Claude
participant V as VoiceAgent
participant B as Brief Processor
(o3-mini)
participant S as SIP Client
participant A as Audio Bridge
participant O as OpenAI Realtime
(gpt-realtime)
participant P as Phone/VoIP
U->>V: makeCall({brief, number, userName})
V->>B: Process brief with o3-mini
B->>B: Generate detailed instructions
& conversation states
B->>V: Sophisticated call instructions
V->>S: Connect to SIP server
S->>P: INVITE (start call)
P->>S: 200 OK (call answered)
S->>V: Call established
V->>A: Initialize audio bridge
V->>O: Connect to OpenAI Realtime
O->>V: WebSocket connected
V->>O: Send generated instructions
loop During Call
P->>A: RTP audio packets
A->>A: Decode G.722/G.711 → PCM
A->>O: Stream PCM audio
O->>O: Process speech → text
O->>O: Generate AI response
O->>A: Stream AI audio (PCM)
A->>A: Encode PCM → G.722/G.711
A->>P: RTP audio packets
Note over V: Monitor call progress
& transcript logging
end
alt Call completed naturally
O->>V: Call completion signal
V->>S: Send BYE
else Duration limit reached
V->>V: Safety timeout triggered
V->>S: Send BYE
end
S->>P: BYE (end call)
P->>S: 200 OK
V->>U: CallResult{transcript, duration, success}
Your simple input:
"Call Bocca di Bacco and book a table for 2 at 7pm tonight"
What o3-mini generates (excerpt):
## Personality and Tone
Identity: I am an assistant calling on behalf of John Doe to make a restaurant reservation.
Task: I am responsible for booking a table for 2 people at Bocca di Bacco today at 7:00 PM.
Tone: Professional, warm, and respectful.
## Instructions
1. Open the conversation immediately: "Hello, this is an assistant calling on behalf of John Doe."
2. Read back critical data: Repeat times and details for confirmation.
3. Handle objections: Respond politely and offer alternatives between 6-8 PM.
...
## Conversation States
[
{
"id": "1_greeting",
"description": "Greeting and introduction of call purpose",
"instructions": ["Introduce yourself as an assistant", "Immediately mention the reservation request"],
"examples": ["Hello, this is an assistant calling on behalf of John Doe. I'm calling to book a table for 2 people today at 7:00 PM."]
}
]
The o3-mini brief processor automatically:
--brief for 95% of calls - it's easier and produces better results--instructions only when you need very specific, custom behaviorThe AI agent supports 10 different voices from OpenAI's Realtime API, each with unique characteristics. By default, the system uses auto mode where o3-mini intelligently selects the optimal voice based on your call's context.
| Voice | Gender | Description | Best For |
|---|---|---|---|
| marin | Female | Clear, professional feminine voice | All-purpose: business calls, customer support, negotiations |
| cedar | Male | Natural masculine voice with warm undertones | All-purpose: professional calls, consultations, service interactions |
| alloy | Neutral | Professional voice with good adaptability | Technical discussions, business contexts, general inquiries |
| echo | Male | Conversational masculine voice | Casual to formal interactions, versatile tone |
| shimmer | Female | Warm, expressive feminine voice | Empathetic conversations, sales, professional contexts |
| coral | Female | Warm and friendly feminine voice | Customer interactions, consultations, support calls |
| sage | Neutral | Calm and thoughtful voice | Medical consultations, advisory roles, serious discussions |
| ash | Neutral | Clear and precise voice | Technical explanations, instructions, educational content |
| ballad | Female | Melodic and smooth feminine voice | Presentations, storytelling, engaging conversations |
| verse | Neutral | Versatile and expressive voice | Dynamic conversations, adaptable to any context |
The auto mode (default) uses o3-mini to analyze your call context and select the most appropriate voice:
# Auto mode - AI selects the best voice
npm start call "+1234567890" --brief "Call doctor's office to schedule appointment" --user-name "John"
# Might select: sage (calm, professional for healthcare)
# Auto mode adapts to context
npm start call "+1234567890" --brief "Call pizza place to order delivery" --user-name "Sarah"
# Might select: coral or echo (friendly, casual for food service)
You can override auto selection when you have specific requirements:
# Use a specific voice
npm start call "+1234567890" --voice marin --brief "Call to book reservation" --user-name "Alex"
# Professional contexts
npm start call "+1234567890" --voice cedar --brief "Call bank about account" --user-name "Pat"
# Friendly service calls
npm start call "+1234567890" --voice coral --brief "Call flower shop for delivery" --user-name "Sam"
Set default voice in your config file or environment:
// config.json
{
"ai": {
"voice": "auto", // or specific voice like "marin", "cedar", etc.
// ...
}
}
# Environment variable
export OPENAI_VOICE=auto # or marin, cedar, alloy, etc.
The auto mode considers these factors:
The MCP tools strongly recommend auto mode but support manual override:
// Simple call - auto voice selection
mcp__callcenter_js__simple_call({
phone_number: "+1234567890",
brief: "Call restaurant for reservation",
caller_name: "John",
voice: "auto" // Optional, defaults to auto
})
// Advanced call - manual voice selection
mcp__callcenter_js__advanced_call({
phone_number: "+1234567890",
goal: "Schedule medical appointment",
user_name: "Jane",
voice: "sage" // Override for specific voice
})
# Default build (WASM refresh + TypeScript, skips if artifacts already exist)
npm run build
# Build components separately (useful for maintainers)
npm run build:wasm # Regenerate the G.722 WebAssembly codec
npm run build:native # Rebuild the native addon (requires toolchain)
npm run build:all # Run native + WASM + TypeScript in one go
npm run build:ts # TypeScript compilation only
# Development with hot reload
npm run dev
# Clean all build artifacts
npm run clean
# Validate any config file
npm run validate path/to/config.json
# Test with different providers
npm run validate -- --provider asterisk config.json
# Get detailed network diagnostics
npm run validate -- --detailed --network config.json
# Show fix suggestions for issues
npm run validate -- --fix-suggestions config.json
src/
├── voice-agent.ts # Main orchestration with ConnectionManager
├── connection-manager.ts # Smart connection handling & reconnection
├── sip-client.ts # Enhanced SIP protocol with provider support
├── audio-bridge.ts # RTP streaming and codec management
├── openai-client.ts # OpenAI Real-Time Voice API integration
├── call-brief-processor.ts # o3-mini model call brief processing
├── mcp-server.ts # MCP (Model Context Protocol) server
├── validation.ts # Configuration validation engine
├── config.ts # Enhanced config loading with provider profiles
├── logger.ts # Comprehensive logging with transcript capture
├── index.ts # Main programmatic API exports
├── providers/
│ └── profiles.ts # Provider-specific configuration database
├── testing/
│ └── network-tester.ts # Real network connectivity testing
├── codecs/ # Codec abstraction layer
│ ├── g722.ts # G.722 wideband implementation
│ └── g711.ts # G.711 fallback codecs
└── cli.ts # Command-line interface
scripts/
└── validate-config.js # Comprehensive validation CLI tool
config.*.example.json # Provider-specific example configurations
The built-in validation system provides comprehensive analysis:
🔍 CallCenter.js Configuration Validator
📋 Provider: AVM Fritz!Box (auto-detected)
🎯 Provider Compatibility Score: 100%
✅ Configuration is valid and ready for use!
🌐 Network Connectivity:
✅ SIP Server: Reachable (12ms latency)
✅ G.722 codec: Available for high-quality audio
💡 Optimization Suggestions:
💡 G.722 wideband codec available (already enabled)
💡 Excellent latency - local network performance optimal
🚀 Next steps: npm start call ""
Run validation first:
npm run validate:detailed
Check provider compatibility:
npm run validate -- --provider fritz-box config.json
Get specific fix suggestions:
npm run validate:fix
Verify G.722 is available:
npm run test:codecs
Check codec negotiation in logs:
✅ Selected codec: PT 9 (G722/8000)
Network issues: High latency/packet loss affects audio quality
Native compilation fails (only happens if you explicitly ran npm run build:native or npm run build:all): stick with the bundled WASM codec unless you specifically need native performance. To drop back to G.711 entirely, run:
npm run build:no-g722
Provider-specific issues: Check validation recommendations for your provider
Server won't start:
# Check for port conflicts or config issues
npm start --mcp
Claude Code not connecting:
This is a personal project that includes:
This project is vibe-coded! 🚀
This means:
The validation tools might help debug issues, but honestly, the real test is whether you can make actual calls.
MIT License - see LICENSE for details.
npm run validate:detailednpm run validate:detailed for diagnosticsReady to get started? Copy an example config, run npm run validate:detailed, and start making AI-powered voice calls! 🚀
Install via CLI
npx mdskills install gerkensm/callcenter-js-mcpCallCenter.js MCP + CLI is a free, open-source AI agent skill. An MCP Server, CLI tool, and API that makes phone calls on your behalf using VoIP. Just tell Claude what you want to accomplish, and it will call and handle the conversation for you. This is essentially an MCP Server that bridges between OpenAI's Real-Time Voice API and your VoIP connection to call people on your behalf. You: "Can you call Tony's Pizza and order a large pepperoni pizza for deliver
Install CallCenter.js MCP + CLI with a single command:
npx mdskills install gerkensm/callcenter-js-mcpThis downloads the skill files into your project and your AI agent picks them up automatically.
CallCenter.js MCP + CLI works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Gemini Cli, Amp, Roo Code, Goose. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.