日本語版 README はこちら Translation quality evaluation MCP Server powered by xCOMET (eXplainable COMET). xCOMET MCP Server provides AI agents with the ability to evaluate machine translation quality. It integrates with the xCOMET model from Unbabel to provide: - Quality Scoring: Scores between 0-1 indicating translation quality - Error Detection: Identifies error spans with severity levels (minor/major/c
Add this skill
npx mdskills install shuji-bonji/xcomet-mcp-serverWell-architected translation quality evaluation server with persistent model loading and comprehensive batch processing.
⚠️ This is an unofficial community project, not affiliated with Unbabel.
Translation quality evaluation MCP Server powered by xCOMET (eXplainable COMET).
xCOMET MCP Server provides AI agents with the ability to evaluate machine translation quality. It integrates with the xCOMET model from Unbabel to provide:
graph LR
A[AI Agent] --> B[Node.js MCP Server]
B --> C[Python FastAPI Server]
C --> D[xCOMET Model
Persistent in Memory]
D --> C
C --> B
B --> A
style D fill:#9f9
xCOMET requires Python with the following packages:
pip install "unbabel-comet>=2.2.0" fastapi uvicorn
The first run will download the xCOMET model (~14GB for XL, ~42GB for XXL):
# Test model availability
python -c "from comet import download_model; download_model('Unbabel/XCOMET-XL')"
# Clone the repository
git clone https://github.com/shuji-bonji/xcomet-mcp-server.git
cd xcomet-mcp-server
# Install dependencies
npm install
# Build
npm run build
Add to your Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"]
}
}
}
claude mcp add xcomet -- npx -y xcomet-mcp-server
If you prefer a local installation:
npm install -g xcomet-mcp-server
Then configure:
{
"mcpServers": {
"xcomet": {
"command": "xcomet-mcp-server"
}
}
}
TRANSPORT=http PORT=3000 npm start
Then connect to http://localhost:3000/mcp
xcomet_evaluateEvaluate translation quality for a single source-translation pair.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
source | string | ✅ | Original source text |
translation | string | ✅ | Translated text to evaluate |
reference | string | ❌ | Reference translation |
source_lang | string | ❌ | Source language code (ISO 639-1) |
target_lang | string | ❌ | Target language code (ISO 639-1) |
response_format | "json" | "markdown" | ❌ | Output format (default: "json") |
use_gpu | boolean | ❌ | Use GPU for inference (default: false) |
Example:
{
"source": "The quick brown fox jumps over the lazy dog.",
"translation": "素早い茶色のキツネが怠惰な犬を飛び越える。",
"source_lang": "en",
"target_lang": "ja",
"use_gpu": true
}
Response:
{
"score": 0.847,
"errors": [],
"summary": "Good quality (score: 0.847) with 0 error(s) detected."
}
xcomet_detect_errorsFocus on detecting and categorizing translation errors.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
source | string | ✅ | Original source text |
translation | string | ✅ | Translated text to analyze |
reference | string | ❌ | Reference translation |
min_severity | "minor" | "major" | "critical" | ❌ | Minimum severity (default: "minor") |
response_format | "json" | "markdown" | ❌ | Output format |
use_gpu | boolean | ❌ | Use GPU for inference (default: false) |
xcomet_batch_evaluateEvaluate multiple translation pairs in a single request.
Performance Note: With the persistent server architecture (v0.3.0+), the model stays loaded in memory. Batch evaluation processes all pairs efficiently without reloading the model.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
pairs | array | ✅ | Array of {source, translation, reference?} (max 500) |
source_lang | string | ❌ | Source language code |
target_lang | string | ❌ | Target language code |
response_format | "json" | "markdown" | ❌ | Output format |
use_gpu | boolean | ❌ | Use GPU for inference (default: false) |
batch_size | number | ❌ | Batch size 1-64 (default: 8). Larger = faster but uses more memory |
Example:
{
"pairs": [
{"source": "Hello", "translation": "こんにちは"},
{"source": "Goodbye", "translation": "さようなら"}
],
"use_gpu": true,
"batch_size": 16
}
xCOMET MCP Server is designed to work alongside other MCP servers for complete translation workflows:
sequenceDiagram
participant Agent as AI Agent
participant DeepL as DeepL MCP Server
participant xCOMET as xCOMET MCP Server
Agent->>DeepL: Translate text
DeepL-->>Agent: Translation result
Agent->>xCOMET: Evaluate quality
xCOMET-->>Agent: Score + Errors
Agent->>Agent: Decide: Accept or retry?
Configure both servers in Claude Desktop:
{
"mcpServers": {
"deepl": {
"command": "npx",
"args": ["-y", "@anthropic/deepl-mcp-server"],
"env": {
"DEEPL_API_KEY": "your-api-key"
}
},
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"]
}
}
}
Then ask Claude:
"Translate this text to Japanese using DeepL, then evaluate the translation quality with xCOMET. If the score is below 0.8, suggest improvements."
| Variable | Default | Description |
|---|---|---|
TRANSPORT | stdio | Transport mode: stdio or http |
PORT | 3000 | HTTP server port (when TRANSPORT=http) |
XCOMET_MODEL | Unbabel/XCOMET-XL | xCOMET model to use |
XCOMET_PYTHON_PATH | (auto-detect) | Python executable path (see below) |
XCOMET_PRELOAD | false | Pre-load model at startup (v0.3.1+) |
XCOMET_DEBUG | false | Enable verbose debug logging (v0.3.1+) |
Choose the model based on your quality/performance needs:
| Model | Parameters | Size | Memory | Reference | Quality | Use Case |
|---|---|---|---|---|---|---|
Unbabel/XCOMET-XL | 3.5B | ~14GB | ~8-10GB | Optional | ⭐⭐⭐⭐ | Recommended for most use cases |
Unbabel/XCOMET-XXL | 10.7B | ~42GB | ~20GB | Optional | ⭐⭐⭐⭐⭐ | Highest quality, requires more resources |
Unbabel/wmt22-comet-da | 580M | ~2GB | ~3GB | Required | ⭐⭐⭐ | Lightweight, faster loading |
Important:
wmt22-comet-darequires areferencetranslation for evaluation. XCOMET models support referenceless evaluation.
Tip: If you experience memory issues or slow model loading, try
Unbabel/wmt22-comet-dafor faster performance with slightly lower accuracy (but remember to provide reference translations).
To use a different model, set the XCOMET_MODEL environment variable:
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_MODEL": "Unbabel/XCOMET-XXL"
}
}
}
}
The server automatically detects a Python environment with unbabel-comet installed:
XCOMET_PYTHON_PATH environment variable (if set)~/.pyenv/versions/*/bin/python3) - checks for comet module/opt/homebrew/bin/python3, /usr/local/bin/python3)python3 commandThis ensures the server works correctly even when the MCP host (e.g., Claude Desktop) uses a different Python than your terminal.
Example: Explicit Python path configuration
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_PYTHON_PATH": "/Users/you/.pyenv/versions/3.11.0/bin/python3"
}
}
}
}
The server uses a persistent Python FastAPI server that keeps the xCOMET model loaded in memory:
| Request | Time | Notes |
|---|---|---|
| First request | ~25-90s | Model loading (varies by model size) |
| Subsequent requests | ~500ms | Model already loaded |
This provides a 177x speedup for consecutive evaluations compared to reloading the model each time.
Enable XCOMET_PRELOAD=true to pre-load the model at server startup:
{
"mcpServers": {
"xcomet": {
"command": "npx",
"args": ["-y", "xcomet-mcp-server"],
"env": {
"XCOMET_PRELOAD": "true"
}
}
}
}
With preload enabled, all requests are fast (~500ms), including the first one.
graph LR
A[MCP Request] --> B[Node.js Server]
B --> C[Python FastAPI Server]
C --> D[xCOMET Model
in Memory]
D --> C
C --> B
B --> A
style D fill:#9f9
The xcomet_batch_evaluate tool processes all pairs with a single model load:
| Pairs | Estimated Time |
|---|---|
| 10 | ~30-40 sec |
| 50 | ~1-1.5 min |
| 100 | ~2 min |
| Mode | 100 Pairs (Estimated) |
|---|---|
| CPU (batch_size=8) | ~2 min |
| GPU (batch_size=16) | ~20-30 sec |
Note: GPU requires CUDA-compatible hardware and PyTorch with CUDA support. If GPU is not available, set
use_gpu: false(default).
1. Let the persistent server do its job
With v0.3.0+, the model stays in memory. Multiple xcomet_evaluate calls are now efficient:
✅ Fast: First call loads model, subsequent calls reuse it
xcomet_evaluate(pair1) # ~90s (model loads)
xcomet_evaluate(pair2) # ~500ms (model cached)
xcomet_evaluate(pair3) # ~500ms (model cached)
2. For many pairs, use batch evaluation
✅ Even faster: Batch all pairs in one call
xcomet_batch_evaluate(allPairs) # Optimal throughput
3. Memory considerations
The server automatically recovers from failures:
| Score Range | Quality | Recommendation |
|---|---|---|
| 0.9 - 1.0 | Excellent | Ready for use |
| 0.7 - 0.9 | Good | Minor review recommended |
| 0.5 - 0.7 | Fair | Post-editing needed |
| 0.0 - 0.5 | Poor | Re-translation recommended |
Cause: Python environment without unbabel-comet installed.
Solution:
# Check which Python is being used
python3 -c "import sys; print(sys.executable)"
# Install all required packages
pip install "unbabel-comet>=2.2.0" fastapi uvicorn
# Or specify Python path explicitly
export XCOMET_PYTHON_PATH=/path/to/python3
Cause: Large model files (~14GB for XL) require stable internet connection.
Solution:
# Pre-download the model manually
python -c "from comet import download_model; download_model('Unbabel/XCOMET-XL')"
Cause: PyTorch not installed with CUDA support.
Solution:
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# If False, reinstall PyTorch with CUDA
pip install torch --index-url https://download.pytorch.org/whl/cu118
Cause: Mac MPS (Metal Performance Shaders) has compatibility issues with some operations.
Solution: The server automatically uses num_workers=1 for Mac MPS compatibility. For best performance on Mac, use CPU mode (use_gpu: false).
Cause: XCOMET-XL requires ~8-10GB RAM.
Solutions:
XCOMET_MODEL=Unbabel/wmt22-comet-da for lower memory usage (~3GB)# Check available memory
free -h # Linux
vm_stat | head -5 # macOS
Cause: High memory usage from the xCOMET model (~8-10GB for XL).
Solution:
XCOMET_MODEL=Unbabel/wmt22-comet-daIf you encounter issues:
# Install dependencies
npm install
# Build TypeScript
npm run build
# Watch mode
npm run dev
# Test with MCP Inspector
npm run inspect
See CHANGELOG.md for version history and updates.
MIT License - see LICENSE for details.
Install via CLI
npx mdskills install shuji-bonji/xcomet-mcp-serverxCOMET MCP Server is a free, open-source AI agent skill. 日本語版 README はこちら Translation quality evaluation MCP Server powered by xCOMET (eXplainable COMET). xCOMET MCP Server provides AI agents with the ability to evaluate machine translation quality. It integrates with the xCOMET model from Unbabel to provide: - Quality Scoring: Scores between 0-1 indicating translation quality - Error Detection: Identifies error spans with severity levels (minor/major/c
Install xCOMET MCP Server with a single command:
npx mdskills install shuji-bonji/xcomet-mcp-serverThis downloads the skill files into your project and your AI agent picks them up automatically.
xCOMET MCP Server works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Gemini Cli, Amp, Roo Code, Goose. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.