A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models. - Dual Provider Support: Choose between Google Gemini API and Vertex AI - Multimodal Analysis: Support for both image and video content analysis - Flexible File Handling: Upload via multiple methods (URLs, local files, base64) - Storage Integration: Built-in Go
Add this skill
npx mdskills install tan-yong-sheng/ai-vision-mcpComprehensive multimodal AI vision server with excellent setup guides and detailed tool documentation
1# AI Vision MCP Server23A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.45## Features67- **Dual Provider Support**: Choose between Google Gemini API and Vertex AI8- **Multimodal Analysis**: Support for both image and video content analysis9- **Flexible File Handling**: Upload via multiple methods (URLs, local files, base64)10- **Storage Integration**: Built-in Google Cloud Storage support11- **Comprehensive Validation**: Zod-based data validation throughout12- **Error Handling**: Robust error handling with retry logic and circuit breakers13- **TypeScript**: Full TypeScript support with strict type checking141516## Quick Start1718### Pre-requisites1920You could choose either to use [`google` provider](https://aistudio.google.com/welcome) or [`vertex_ai` provider](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart). For simplicity, `google` provider is recommended.2122Below are the environment variables you need to set based on your selected provider. (Note: It’s recommended to set the timeout configuration to more than 5 minutes for your MCP client).2324(i) **Using Google AI Studio Provider**2526```bash27export IMAGE_PROVIDER="google" # or vertex_ai28export VIDEO_PROVIDER="google" # or vertex_ai29export GEMINI_API_KEY="your-gemini-api-key"30```3132Get your Google AI Studio's api key [here](https://aistudio.google.com/app/api-keys)3334(ii) **Using Vertex AI Provider**3536```bash37export IMAGE_PROVIDER="vertex_ai"38export VIDEO_PROVIDER="vertex_ai"39export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com"40export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"41export VERTEX_PROJECT_ID="your-gcp-project-id"42export GCS_BUCKET_NAME="your-gcs-bucket"43```4445Refer to [the guideline here](docs/provider/vertex-ai-setup-guide.md) on how to set this up.464748### Installation4950Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.5152<details>53<summary>Claude Desktop</summary>5455Add to your Claude Desktop configuration:5657(i) Using Google AI Studio Provider58```json59{60 "mcpServers": {61 "ai-vision-mcp": {62 "command": "npx",63 "args": ["ai-vision-mcp"],64 "env": {65 "IMAGE_PROVIDER": "google",66 "VIDEO_PROVIDER": "google",67 "GEMINI_API_KEY": "your-gemini-api-key"68 }69 }70 }71}72```7374(ii) Using Vertex AI Provider75```json76{77 "mcpServers": {78 "ai-vision-mcp": {79 "command": "npx",80 "args": ["ai-vision-mcp"],81 "env": {82 "IMAGE_PROVIDER": "vertex_ai",83 "VIDEO_PROVIDER": "vertex_ai",84 "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",85 "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",86 "VERTEX_PROJECT_ID": "your-gcp-project-id",87 "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"88 }89 }90 }91}92```9394</details>9596<details>97<summary>Claude Code</summary>9899(i) Using Google AI Studio Provider100```bash101claude mcp add ai-vision-mcp \102 -e IMAGE_PROVIDER=google \103 -e VIDEO_PROVIDER=google \104 -e GEMINI_API_KEY=your-gemini-api-key \105 -- npx ai-vision-mcp106```107108(ii) Using Vertex AI Provider109```bash110claude mcp add ai-vision-mcp \111 -e IMAGE_PROVIDER=vertex_ai \112 -e VIDEO_PROVIDER=vertex_ai \113 -e VERTEX_CLIENT_EMAIL=your-service-account@project.iam.gserviceaccount.com \114 -e VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" \115 -e VERTEX_PROJECT_ID=your-gcp-project-id \116 -e GCS_BUCKET_NAME=ai-vision-mcp-{VERTEX_PROJECT_ID} \117 -- npx ai-vision-mcp118```119120121Note: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating `~\.claude\settings.json` as follows:122123```json124{125 "env": {126 "MCP_TIMEOUT": "60000",127 "MCP_TOOL_TIMEOUT": "300000"128 }129}130```131132</details>133134<details>135<summary>Cursor</summary>136137Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server138139Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See [Cursor MCP docs](https://docs.cursor.com/context/model-context-protocol) for more info.140141(i) Using Google AI Studio Provider142```json143{144 "mcpServers": {145 "ai-vision-mcp": {146 "command": "npx",147 "args": ["ai-vision-mcp"],148 "env": {149 "IMAGE_PROVIDER": "google",150 "VIDEO_PROVIDER": "google",151 "GEMINI_API_KEY": "your-gemini-api-key"152 }153 }154 }155}156```157158(ii) Using Vertex AI Provider159```json160{161 "mcpServers": {162 "ai-vision-mcp": {163 "command": "npx",164 "args": ["ai-vision-mcp"],165 "env": {166 "IMAGE_PROVIDER": "vertex_ai",167 "VIDEO_PROVIDER": "vertex_ai",168 "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",169 "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",170 "VERTEX_PROJECT_ID": "your-gcp-project-id",171 "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"172 }173 }174 }175}176```177</details>178179180<details>181<summary>Cline</summary>182183Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:1841851. Open Cline and click on the MCP Servers icon in the top navigation bar.1862. Select the Installed tab, then click Advanced MCP Settings.1873. In the cline_mcp_settings.json file, add the following configuration:188189(i) Using Google AI Studio Provider190```json191{192 "mcpServers": {193 "timeout": 300,194 "type": "stdio",195 "ai-vision-mcp": {196 "command": "npx",197 "args": ["ai-vision-mcp"],198 "env": {199 "IMAGE_PROVIDER": "google",200 "VIDEO_PROVIDER": "google",201 "GEMINI_API_KEY": "your-gemini-api-key"202 }203 }204 }205}206```207208(ii) Using Vertex AI Provider209```json210{211 "mcpServers": {212 "ai-vision-mcp": {213 "timeout": 300,214 "type": "stdio",215 "command": "npx",216 "args": ["ai-vision-mcp"],217 "env": {218 "IMAGE_PROVIDER": "vertex_ai",219 "VIDEO_PROVIDER": "vertex_ai",220 "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",221 "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",222 "VERTEX_PROJECT_ID": "your-gcp-project-id",223 "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"224 }225 }226 }227}228```229</details>230231232<details>233234<summary>Other MCP clients</summary>235236The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:237238```bash239npx ai-vision-mcp240```241</details>242243244## MCP Tools245246The server provides four main MCP tools:247248### 1) `analyze_image`249250Analyzes an image using AI and returns a detailed description.251252**Parameters:**253- `imageSource` (string): URL, base64 data, or file path to the image254- `prompt` (string): Question or instruction for the AI255- `options` (object, optional): Analysis options including temperature and max tokens256257**Examples:**2582591. **Analyze image from URL:**260```json261{262 "imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",263 "prompt": "What is this image about? Describe what you see in detail."264}265```2662672. **Analyze local image file:**268```json269{270 "imageSource": "C:\\Users\\username\\Downloads\\image.jpg",271 "prompt": "What is this image about? Describe what you see in detail."272}273```274275276### 2) `compare_images`277278Compares multiple images using AI and returns a detailed comparison analysis.279280**Parameters:**281- `imageSources` (array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 images282- `prompt` (string): Question or instruction for comparing the images283- `options` (object, optional): Analysis options including temperature and max tokens284285**Examples:**2862871. **Compare images from URLs:**288```json289{290 "imageSources": [291 "https://example.com/image1.jpg",292 "https://example.com/image2.jpg"293 ],294 "prompt": "Compare these two images and tell me the differences"295}296```2972982. **Compare mixed sources:**299```json300{301 "imageSources": [302 "https://example.com/image1.jpg",303 "C:\\\\Users\\\\username\\\\Downloads\\\\image2.jpg",304 "data:image/jpeg;base64,/9j/4AAQSkZJRgAB..."305 ],306 "prompt": "Which image has the best lighting quality?"307}308```309310### 3) `detect_objects_in_image`311312Detects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory.313314**Parameters:**315- `imageSource` (string): URL, base64 data, or file path to the image316- `prompt` (string): Custom detection prompt describing what to detect or recognize in the image317- `outputFilePath` (string, optional): Explicit output path for the annotated image318319**Configuration:**320This function uses optimized default parameters for object detection and does not accept runtime `options` parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables:321322```323# Recommended environment variable settings for object detection (these are now the defaults)324TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0 # Deterministic responses325TOP_P_FOR_DETECT_OBJECTS_IN_IMAGE=0.95 # Nucleus sampling326TOP_K_FOR_DETECT_OBJECTS_IN_IMAGE=30 # Vocabulary selection327MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192 # High token limit for JSON328```329330**File Handling Logic:**3311. **Explicit outputFilePath provided** → Saves to the exact path specified3322. **If not explicit outputFilePath** → Automatically saves to temporary directory333334**Response Types:**335- Returns `file` object when explicit outputFilePath is provided336- Returns `tempFile` object when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folder337- Always includes `detections` array with detected objects and coordinates338- Includes `summary` with percentage-based coordinates for browser automation339340**Examples:**3413421. **Basic object detection:**343```json344{345 "imageSource": "https://example.com/image.jpg",346 "prompt": "Detect all objects in this image"347}348```3493502. **Save annotated image to specific path:**351```json352{353 "imageSource": "C:\\Users\\username\\Downloads\\image.jpg",354 "outputFilePath": "C:\\Users\\username\\Documents\\annotated_image.png"355}356```3573583. **Custom detection prompt:**359```json360{361 "imageSource": "data:image/jpeg;base64,/9j/4AAQSkZJRgAB...",362 "prompt": "Detect and label all electronic devices in this image"363}364```365366367### 4) `analyze_video`368369Analyzes a video using AI and returns a detailed description.370371**Parameters:**372- `videoSource` (string): YouTube URL, GCS URI, or local file path to the video373- `prompt` (string): Question or instruction for the AI374- `options` (object, optional): Analysis options including temperature and max tokens375376**Supported video sources:**377- YouTube URLs (e.g., `https://www.youtube.com/watch?v=...`)378- Local file paths (e.g., `C:\Users\username\Downloads\video.mp4`)379380**Examples:**3813821. **Analyze video from YouTube URL:**383```json384{385 "videoSource": "https://www.youtube.com/watch?v=9hE5-98ZeCg",386 "prompt": "What is this video about? Describe what you see in detail."387}388```3893902. **Analyze local video file:**391```json392{393 "videoSource": "C:\\Users\\username\\Downloads\\video.mp4",394 "prompt": "What is this video about? Describe what you see in detail."395}396```397398**Note:** Only YouTube URLs are supported for public video URLs. Other public video URLs are not currently supported.399400401## Environment Configuration402403For basic setup, you only need to configure the provider selection and required credentials:404405### Google AI Studio Provider (Recommended)406```bash407export IMAGE_PROVIDER="google"408export VIDEO_PROVIDER="google"409export GEMINI_API_KEY="your-gemini-api-key"410```411412### Vertex AI Provider (Production)413```bash414export IMAGE_PROVIDER="vertex_ai"415export VIDEO_PROVIDER="vertex_ai"416export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com"417export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"418export VERTEX_PROJECT_ID="your-gcp-project-id"419export GCS_BUCKET_NAME="your-gcs-bucket"420```421422### 📖 **Detailed Configuration Guide**423424For comprehensive environment variable documentation, including:425- Complete configuration reference (60+ environment variables)426- Function-specific optimization examples427- Advanced configuration patterns428- Troubleshooting guidance429430👉 **[See Environment Variable Guide](docs/environment-variable-guide.md)**431432### Configuration Priority Overview433434The server uses a hierarchical configuration system where more specific settings override general ones:4354361. **LLM-assigned values** (runtime parameters in tool calls)4372. **Function-specific variables** (`TEMPERATURE_FOR_ANALYZE_IMAGE`, etc.)4383. **Task-specific variables** (`TEMPERATURE_FOR_IMAGE`, etc.)4394. **Universal variables** (`TEMPERATURE`, etc.)4405. **System defaults**441442<details>443<summary><strong>Quick Configuration Examples</strong></summary>444445**Basic Optimization:**446```bash447# General settings448export TEMPERATURE=0.7449export MAX_TOKENS=1500450451# Task-specific optimization452export TEMPERATURE_FOR_IMAGE=0.2 # More precise for images453export TEMPERATURE_FOR_VIDEO=0.5 # More creative for videos454```455456**Function-specific Optimization:**457```bash458# Optimize individual functions459export TEMPERATURE_FOR_ANALYZE_IMAGE=0.1460export TEMPERATURE_FOR_COMPARE_IMAGES=0.3461export TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0 # Deterministic462export MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192 # High token limit463```464465**Model Selection:**466```bash467# Choose models per function468export ANALYZE_IMAGE_MODEL="gemini-2.5-flash-lite"469export COMPARE_IMAGES_MODEL="gemini-2.5-flash"470export ANALYZE_VIDEO_MODEL="gemini-2.5-flash-pro"471```472</details>473474## Troubleshooting (stdio / Codex / Claude Code)475476### 1) "Transport closed" / tool call fails477478If you see errors like:479480- `tools/call failed: Transport closed`481482Common causes:483484**A) Image annotation dependency failed to load**485486This server uses [`imagescript`](https://github.com/matmen/ImageScript) for image annotation/dimension extraction.487488Verify it loads:489490```bash491npm run doctor492# or493npm run check:imagescript494```495496**B) stdout logs corrupt stdio MCP framing**497498This server uses the MCP **stdio** transport (newline-delimited JSON-RPC over stdout).499500- ✅ stdout must contain **only** MCP JSON-RPC messages501- ✅ write logs to **stderr** (e.g. `console.error`)502- ❌ do not use `console.log` in stdio MCP servers503504If stdout is polluted, clients (Codex/Claude Code) may disconnect and report `Transport closed`.505506## Development507508### Prerequisites509510- Node.js 18+511- npm or yarn512513### Setup514515```bash516# Clone the repository517git clone https://github.com/tan-yong-sheng/ai-vision-mcp.git518cd ai-vision-mcp519520# Install dependencies521npm install522523# Build the project524npm run build525526# Start development server527npm run dev528```529530### Scripts531532- `npm run build` - Build the TypeScript project533- `npm run dev` - Start development server with watch mode534- `npm run lint` - Run ESLint535- `npm run format` - Format code with Prettier536- `npm start` - Start the built server537538## Architecture539540The project follows a modular architecture:541542```543src/544├── providers/ # AI provider implementations545│ ├── gemini/ # Google Gemini provider546│ ├── vertexai/ # Vertex AI provider547│ └── factory/ # Provider factory548├── services/ # Core services549│ ├── ConfigService.ts550│ └── FileService.ts551├── storage/ # Storage implementations552├── file-upload/ # File upload strategies553├── types/ # TypeScript type definitions554├── utils/ # Utility functions555└── server.ts # Main MCP server556```557558## Error Handling559560The server includes comprehensive error handling:561562- **Validation Errors**: Input validation using Zod schemas563- **Network Errors**: Automatic retries with exponential backoff564- **Authentication Errors**: Clear error messages for API key issues565- **File Errors**: Handling for file size limits and format restrictions566567## Contributing5685691. Fork the repository5702. Create a feature branch (`git checkout -b feature/amazing-feature`)5713. Commit your changes (`git commit -m 'Add amazing feature'`)5724. Push to the branch (`git push origin feature/amazing-feature`)5735. Open a Pull Request574575## License576577This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.578579## Acknowledgments580581- Google for the Gemini and Vertex AI APIs582- The Model Context Protocol team for the MCP framework583- All contributors and users of this project
Full transparency — inspect the skill content before installing.