日本語版 README はこちら Translation quality evaluation MCP Server powered by xCOMET (eXplainable COMET). xCOMET MCP Server provides AI agents with the ability to evaluate machine translation quality. It integrates with the xCOMET model from Unbabel to provide: - Quality Scoring: Scores between 0-1 indicating translation quality - Error Detection: Identifies error spans with severity levels (minor/major/c
Add this skill
npx mdskills install shuji-bonji/xcomet-mcp-serverWell-architected translation quality evaluation server with persistent model loading and comprehensive batch processing.
1# xCOMET MCP Server23[](https://www.npmjs.com/package/xcomet-mcp-server)4[](https://github.com/shuji-bonji/xcomet-mcp-server/actions/workflows/ci.yml)5[](https://modelcontextprotocol.io)6[](https://opensource.org/licenses/MIT)78**[日本語版 README はこちら](README.ja.md)**910> ⚠️ This is an unofficial community project, not affiliated with Unbabel.1112Translation quality evaluation MCP Server powered by [xCOMET](https://github.com/Unbabel/COMET) (eXplainable COMET).1314## 🎯 Overview1516xCOMET MCP Server provides AI agents with the ability to evaluate machine translation quality. It integrates with the xCOMET model from Unbabel to provide:1718- **Quality Scoring**: Scores between 0-1 indicating translation quality19- **Error Detection**: Identifies error spans with severity levels (minor/major/critical)20- **Batch Processing**: Evaluate multiple translation pairs efficiently (optimized single model load)21- **GPU Support**: Optional GPU acceleration for faster inference2223```mermaid24graph LR25 A[AI Agent] --> B[Node.js MCP Server]26 B --> C[Python FastAPI Server]27 C --> D[xCOMET Model<br/>Persistent in Memory]28 D --> C29 C --> B30 B --> A3132 style D fill:#9f933```3435## 🔧 Prerequisites3637### Python Environment3839xCOMET requires Python with the following packages:4041```bash42pip install "unbabel-comet>=2.2.0" fastapi uvicorn43```4445### Model Download4647The first run will download the xCOMET model (~14GB for XL, ~42GB for XXL):4849```bash50# Test model availability51python -c "from comet import download_model; download_model('Unbabel/XCOMET-XL')"52```5354### Node.js5556- Node.js >= 18.0.057- npm or yarn5859## 📦 Installation6061```bash62# Clone the repository63git clone https://github.com/shuji-bonji/xcomet-mcp-server.git64cd xcomet-mcp-server6566# Install dependencies67npm install6869# Build70npm run build71```7273## 🚀 Usage7475### With Claude Desktop (npx)7677Add to your Claude Desktop configuration (`claude_desktop_config.json`):7879```json80{81 "mcpServers": {82 "xcomet": {83 "command": "npx",84 "args": ["-y", "xcomet-mcp-server"]85 }86 }87}88```8990### With Claude Code9192```bash93claude mcp add xcomet -- npx -y xcomet-mcp-server94```9596### Local Installation9798If you prefer a local installation:99100```bash101npm install -g xcomet-mcp-server102```103104Then configure:105```json106{107 "mcpServers": {108 "xcomet": {109 "command": "xcomet-mcp-server"110 }111 }112}113```114115### HTTP Mode (Remote Access)116117```bash118TRANSPORT=http PORT=3000 npm start119```120121Then connect to `http://localhost:3000/mcp`122123## 🛠️ Available Tools124125### `xcomet_evaluate`126127Evaluate translation quality for a single source-translation pair.128129**Parameters:**130| Name | Type | Required | Description |131|------|------|----------|-------------|132| `source` | string | ✅ | Original source text |133| `translation` | string | ✅ | Translated text to evaluate |134| `reference` | string | ❌ | Reference translation |135| `source_lang` | string | ❌ | Source language code (ISO 639-1) |136| `target_lang` | string | ❌ | Target language code (ISO 639-1) |137| `response_format` | "json" \| "markdown" | ❌ | Output format (default: "json") |138| `use_gpu` | boolean | ❌ | Use GPU for inference (default: false) |139140**Example:**141```json142{143 "source": "The quick brown fox jumps over the lazy dog.",144 "translation": "素早い茶色のキツネが怠惰な犬を飛び越える。",145 "source_lang": "en",146 "target_lang": "ja",147 "use_gpu": true148}149```150151**Response:**152```json153{154 "score": 0.847,155 "errors": [],156 "summary": "Good quality (score: 0.847) with 0 error(s) detected."157}158```159160### `xcomet_detect_errors`161162Focus on detecting and categorizing translation errors.163164**Parameters:**165| Name | Type | Required | Description |166|------|------|----------|-------------|167| `source` | string | ✅ | Original source text |168| `translation` | string | ✅ | Translated text to analyze |169| `reference` | string | ❌ | Reference translation |170| `min_severity` | "minor" \| "major" \| "critical" | ❌ | Minimum severity (default: "minor") |171| `response_format` | "json" \| "markdown" | ❌ | Output format |172| `use_gpu` | boolean | ❌ | Use GPU for inference (default: false) |173174### `xcomet_batch_evaluate`175176Evaluate multiple translation pairs in a single request.177178> **Performance Note**: With the persistent server architecture (v0.3.0+), the model stays loaded in memory. Batch evaluation processes all pairs efficiently without reloading the model.179180**Parameters:**181| Name | Type | Required | Description |182|------|------|----------|-------------|183| `pairs` | array | ✅ | Array of {source, translation, reference?} (max 500) |184| `source_lang` | string | ❌ | Source language code |185| `target_lang` | string | ❌ | Target language code |186| `response_format` | "json" \| "markdown" | ❌ | Output format |187| `use_gpu` | boolean | ❌ | Use GPU for inference (default: false) |188| `batch_size` | number | ❌ | Batch size 1-64 (default: 8). Larger = faster but uses more memory |189190**Example:**191```json192{193 "pairs": [194 {"source": "Hello", "translation": "こんにちは"},195 {"source": "Goodbye", "translation": "さようなら"}196 ],197 "use_gpu": true,198 "batch_size": 16199}200```201202## 🔗 Integration with Other MCP Servers203204xCOMET MCP Server is designed to work alongside other MCP servers for complete translation workflows:205206```mermaid207sequenceDiagram208 participant Agent as AI Agent209 participant DeepL as DeepL MCP Server210 participant xCOMET as xCOMET MCP Server211212 Agent->>DeepL: Translate text213 DeepL-->>Agent: Translation result214 Agent->>xCOMET: Evaluate quality215 xCOMET-->>Agent: Score + Errors216 Agent->>Agent: Decide: Accept or retry?217```218219### Recommended Workflow2202211. **Translate** using DeepL MCP Server (official)2222. **Evaluate** using xCOMET MCP Server2233. **Iterate** if quality is below threshold224225### Example: DeepL + xCOMET Integration226227Configure both servers in Claude Desktop:228229```json230{231 "mcpServers": {232 "deepl": {233 "command": "npx",234 "args": ["-y", "@anthropic/deepl-mcp-server"],235 "env": {236 "DEEPL_API_KEY": "your-api-key"237 }238 },239 "xcomet": {240 "command": "npx",241 "args": ["-y", "xcomet-mcp-server"]242 }243 }244}245```246247Then ask Claude:248> "Translate this text to Japanese using DeepL, then evaluate the translation quality with xCOMET. If the score is below 0.8, suggest improvements."249250## ⚙️ Configuration251252### Environment Variables253254| Variable | Default | Description |255|----------|---------|-------------|256| `TRANSPORT` | `stdio` | Transport mode: `stdio` or `http` |257| `PORT` | `3000` | HTTP server port (when TRANSPORT=http) |258| `XCOMET_MODEL` | `Unbabel/XCOMET-XL` | xCOMET model to use |259| `XCOMET_PYTHON_PATH` | (auto-detect) | Python executable path (see below) |260| `XCOMET_PRELOAD` | `false` | Pre-load model at startup (v0.3.1+) |261| `XCOMET_DEBUG` | `false` | Enable verbose debug logging (v0.3.1+) |262263### Model Selection264265Choose the model based on your quality/performance needs:266267| Model | Parameters | Size | Memory | Reference | Quality | Use Case |268|-------|------------|------|--------|-----------|---------|----------|269| `Unbabel/XCOMET-XL` | 3.5B | ~14GB | ~8-10GB | Optional | ⭐⭐⭐⭐ | Recommended for most use cases |270| `Unbabel/XCOMET-XXL` | 10.7B | ~42GB | ~20GB | Optional | ⭐⭐⭐⭐⭐ | Highest quality, requires more resources |271| `Unbabel/wmt22-comet-da` | 580M | ~2GB | ~3GB | **Required** | ⭐⭐⭐ | Lightweight, faster loading |272273> **Important**: `wmt22-comet-da` requires a `reference` translation for evaluation. XCOMET models support referenceless evaluation.274275> **Tip**: If you experience memory issues or slow model loading, try `Unbabel/wmt22-comet-da` for faster performance with slightly lower accuracy (but remember to provide reference translations).276277**To use a different model**, set the `XCOMET_MODEL` environment variable:278279```json280{281 "mcpServers": {282 "xcomet": {283 "command": "npx",284 "args": ["-y", "xcomet-mcp-server"],285 "env": {286 "XCOMET_MODEL": "Unbabel/XCOMET-XXL"287 }288 }289 }290}291```292293### Python Path Auto-Detection294295The server automatically detects a Python environment with `unbabel-comet` installed:2962971. **`XCOMET_PYTHON_PATH`** environment variable (if set)2982. **pyenv** versions (`~/.pyenv/versions/*/bin/python3`) - checks for `comet` module2993. **Homebrew** Python (`/opt/homebrew/bin/python3`, `/usr/local/bin/python3`)3004. **Fallback**: `python3` command301302This ensures the server works correctly even when the MCP host (e.g., Claude Desktop) uses a different Python than your terminal.303304**Example: Explicit Python path configuration**305```json306{307 "mcpServers": {308 "xcomet": {309 "command": "npx",310 "args": ["-y", "xcomet-mcp-server"],311 "env": {312 "XCOMET_PYTHON_PATH": "/Users/you/.pyenv/versions/3.11.0/bin/python3"313 }314 }315 }316}317```318319## ⚡ Performance320321### Persistent Server Architecture (v0.3.0+)322323The server uses a **persistent Python FastAPI server** that keeps the xCOMET model loaded in memory:324325| Request | Time | Notes |326|---------|------|-------|327| First request | ~25-90s | Model loading (varies by model size) |328| Subsequent requests | **~500ms** | Model already loaded |329330This provides a **177x speedup** for consecutive evaluations compared to reloading the model each time.331332### Eager Loading (v0.3.1+)333334Enable `XCOMET_PRELOAD=true` to pre-load the model at server startup:335336```json337{338 "mcpServers": {339 "xcomet": {340 "command": "npx",341 "args": ["-y", "xcomet-mcp-server"],342 "env": {343 "XCOMET_PRELOAD": "true"344 }345 }346 }347}348```349350With preload enabled, **all requests are fast** (~500ms), including the first one.351352```mermaid353graph LR354 A[MCP Request] --> B[Node.js Server]355 B --> C[Python FastAPI Server]356 C --> D[xCOMET Model<br/>in Memory]357 D --> C358 C --> B359 B --> A360361 style D fill:#9f9362```363364### Batch Processing Optimization365366The `xcomet_batch_evaluate` tool processes all pairs with a single model load:367368| Pairs | Estimated Time |369|-------|----------------|370| 10 | ~30-40 sec |371| 50 | ~1-1.5 min |372| 100 | ~2 min |373374### GPU vs CPU Performance375376| Mode | 100 Pairs (Estimated) |377|------|----------------------|378| CPU (batch_size=8) | ~2 min |379| GPU (batch_size=16) | ~20-30 sec |380381> **Note**: GPU requires CUDA-compatible hardware and PyTorch with CUDA support. If GPU is not available, set `use_gpu: false` (default).382383### Best Practices384385**1. Let the persistent server do its job**386387With v0.3.0+, the model stays in memory. Multiple `xcomet_evaluate` calls are now efficient:388389```390✅ Fast: First call loads model, subsequent calls reuse it391 xcomet_evaluate(pair1) # ~90s (model loads)392 xcomet_evaluate(pair2) # ~500ms (model cached)393 xcomet_evaluate(pair3) # ~500ms (model cached)394```395396**2. For many pairs, use batch evaluation**397398```399✅ Even faster: Batch all pairs in one call400 xcomet_batch_evaluate(allPairs) # Optimal throughput401```402403**3. Memory considerations**404405- XCOMET-XL requires ~8-10GB RAM406- For large batches (500 pairs), ensure sufficient memory407- If memory is limited, split into smaller batches (100-200 pairs)408409### Auto-Restart (v0.3.1+)410411The server automatically recovers from failures:412- Monitors health every 30 seconds413- Restarts after 3 consecutive health check failures414- Up to 3 restart attempts before giving up415416## 📊 Quality Score Interpretation417418| Score Range | Quality | Recommendation |419|-------------|---------|----------------|420| 0.9 - 1.0 | Excellent | Ready for use |421| 0.7 - 0.9 | Good | Minor review recommended |422| 0.5 - 0.7 | Fair | Post-editing needed |423| 0.0 - 0.5 | Poor | Re-translation recommended |424425## 🔍 Troubleshooting426427### Common Issues428429#### "No module named 'comet'"430431**Cause**: Python environment without `unbabel-comet` installed.432433**Solution**:434```bash435# Check which Python is being used436python3 -c "import sys; print(sys.executable)"437438# Install all required packages439pip install "unbabel-comet>=2.2.0" fastapi uvicorn440441# Or specify Python path explicitly442export XCOMET_PYTHON_PATH=/path/to/python3443```444445#### Model download fails or times out446447**Cause**: Large model files (~14GB for XL) require stable internet connection.448449**Solution**:450```bash451# Pre-download the model manually452python -c "from comet import download_model; download_model('Unbabel/XCOMET-XL')"453```454455#### GPU not detected456457**Cause**: PyTorch not installed with CUDA support.458459**Solution**:460```bash461# Check CUDA availability462python -c "import torch; print(torch.cuda.is_available())"463464# If False, reinstall PyTorch with CUDA465pip install torch --index-url https://download.pytorch.org/whl/cu118466```467468#### Slow performance on Mac (MPS)469470**Cause**: Mac MPS (Metal Performance Shaders) has compatibility issues with some operations.471472**Solution**: The server automatically uses `num_workers=1` for Mac MPS compatibility. For best performance on Mac, use CPU mode (`use_gpu: false`).473474#### High memory usage or crashes475476**Cause**: XCOMET-XL requires ~8-10GB RAM.477478**Solutions**:4791. **Use the persistent server** (v0.3.0+): Model loads once and stays in memory, avoiding repeated memory spikes4802. **Use a lighter model**: Set `XCOMET_MODEL=Unbabel/wmt22-comet-da` for lower memory usage (~3GB)4813. **Reduce batch size**: For large batches, process in smaller chunks (100-200 pairs)4824. **Close other applications**: Free up RAM before running large evaluations483484```bash485# Check available memory486free -h # Linux487vm_stat | head -5 # macOS488```489490#### VS Code or IDE crashes during evaluation491492**Cause**: High memory usage from the xCOMET model (~8-10GB for XL).493494**Solution**:495- With v0.3.0+, the model loads once and stays in memory (no repeated loading)496- If memory is still an issue, use a lighter model: `XCOMET_MODEL=Unbabel/wmt22-comet-da`497- Close other memory-intensive applications before evaluation498499### Getting Help500501If you encounter issues:5025031. Check the [GitHub Issues](https://github.com/shuji-bonji/xcomet-mcp-server/issues)5042. Enable debug logging by checking Claude Desktop's Developer Mode logs5053. Open a new issue with:506 - Your OS and Python version507 - The error message508 - Your configuration (without sensitive data)509510## 🧪 Development511512```bash513# Install dependencies514npm install515516# Build TypeScript517npm run build518519# Watch mode520npm run dev521522# Test with MCP Inspector523npm run inspect524```525526## 📋 Changelog527528See [CHANGELOG.md](CHANGELOG.md) for version history and updates.529530## 📝 License531532MIT License - see [LICENSE](LICENSE) for details.533534## 🙏 Acknowledgments535536- [Unbabel](https://unbabel.com/) for the xCOMET model537- [Anthropic](https://anthropic.com/) for the MCP protocol538- [Model Context Protocol](https://modelcontextprotocol.io/) community539540## 📚 References541542- [xCOMET Paper](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00683/124263/xcomet-Transparent-Machine-Translation-Evaluation)543- [COMET Framework](https://github.com/Unbabel/COMET)544- [MCP Specification](https://spec.modelcontextprotocol.io/)545
Full transparency — inspect the skill content before installing.