How do I install AI Vision MCP Server?

Install AI Vision MCP Server with a single command: npx mdskills install tan-yong-sheng/ai-vision-mcp. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support AI Vision MCP Server?

AI Vision MCP Server works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Gemini Cli, Amp, Roo Code, Goose. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to MCP servers

AI Vision MCP Server

Name: AI Vision MCP Server: AI Agent Skill
Brand: tan-yong-sheng
Availability: InStock
Rating: 8 (1 reviews)
Author: tan-yong-sheng

Verified

MCP ServerDesign & CreativeIntermediate

A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models. - Dual Provider Support: Choose between Google Gemini API and Vertex AI - Multimodal Analysis: Support for both image and video content analysis - Flexible File Handling: Upload via multiple methods (URLs, local files, base64) - Storage Integration: Built-in Go

by @tan-yong-sheng 41Updated 2/24/2026

Add this skill

npx mdskills install tan-yong-sheng/ai-vision-mcp

Fork & Edit

Are you @tan-yong-sheng? Sign in with GitHub to claim this listing.

Skill Advisor8.0

Comprehensive multimodal AI vision server with excellent setup guides and detailed tool documentation

+Provides well-documented tools for image/video analysis with clear parameter specifications
+Includes thorough installation guides for multiple MCP clients with both provider options
+Offers flexible input handling (URLs, local files, base64) with object detection and comparison capabilities
-Declares shell execution and git write permissions without clear justification in documentation

SKILL.md

Edit in Browser

1# AI Vision MCP Server
2 
3A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.
4 
5## Features
6 
7- **Dual Provider Support**: Choose between Google Gemini API and Vertex AI
8- **Multimodal Analysis**: Support for both image and video content analysis
9- **Flexible File Handling**: Upload via multiple methods (URLs, local files, base64)
10- **Storage Integration**: Built-in Google Cloud Storage support
11- **Comprehensive Validation**: Zod-based data validation throughout
12- **Error Handling**: Robust error handling with retry logic and circuit breakers
13- **TypeScript**: Full TypeScript support with strict type checking
14 
15 
16## Quick Start
17 
18### Pre-requisites
19 
20You could choose either to use [`google` provider](https://aistudio.google.com/welcome) or [`vertex_ai` provider](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart). For simplicity, `google` provider is recommended.
21 
22Below are the environment variables you need to set based on your selected provider. (Note: It’s recommended to set the timeout configuration to more than 5 minutes for your MCP client).
23 
24(i) **Using Google AI Studio Provider**
25 
26```bash
27export IMAGE_PROVIDER="google" # or vertex_ai
28export VIDEO_PROVIDER="google" # or vertex_ai
29export GEMINI_API_KEY="your-gemini-api-key"
30```
31 
32Get your Google AI Studio's api key [here](https://aistudio.google.com/app/api-keys)
33 
34(ii) **Using Vertex AI Provider**
35 
36```bash
37export IMAGE_PROVIDER="vertex_ai"
38export VIDEO_PROVIDER="vertex_ai"
39export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com"
40export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
41export VERTEX_PROJECT_ID="your-gcp-project-id"
42export GCS_BUCKET_NAME="your-gcs-bucket"
43```
44 
45Refer to [the guideline here](docs/provider/vertex-ai-setup-guide.md) on how to set this up.
46 
47 
48### Installation
49 
50Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.
51 
52<details>
53<summary>Claude Desktop</summary>
54 
55Add to your Claude Desktop configuration:
56 
57(i) Using Google AI Studio Provider
58```json
59{
60  "mcpServers": {
61    "ai-vision-mcp": {
62      "command": "npx",
63      "args": ["ai-vision-mcp"],
64      "env": {
65        "IMAGE_PROVIDER": "google",
66        "VIDEO_PROVIDER": "google",
67        "GEMINI_API_KEY": "your-gemini-api-key"
68      }
69    }
70  }
71}
72```
73 
74(ii) Using Vertex AI Provider
75```json
76{
77  "mcpServers": {
78    "ai-vision-mcp": {
79      "command": "npx",
80      "args": ["ai-vision-mcp"],
81      "env": {
82        "IMAGE_PROVIDER": "vertex_ai",
83        "VIDEO_PROVIDER": "vertex_ai",
84        "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",
85        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
86        "VERTEX_PROJECT_ID": "your-gcp-project-id",
87        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
88      }
89    }
90  }
91}
92```
93 
94</details>
95 
96<details>
97<summary>Claude Code</summary>
98 
99(i) Using Google AI Studio Provider
100```bash
101claude mcp add ai-vision-mcp \
102  -e IMAGE_PROVIDER=google \
103  -e VIDEO_PROVIDER=google \
104  -e GEMINI_API_KEY=your-gemini-api-key \
105  -- npx ai-vision-mcp
106```
107 
108(ii) Using Vertex AI Provider
109```bash
110claude mcp add ai-vision-mcp \
111  -e IMAGE_PROVIDER=vertex_ai \
112  -e VIDEO_PROVIDER=vertex_ai \
113  -e VERTEX_CLIENT_EMAIL=your-service-account@project.iam.gserviceaccount.com \
114  -e VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" \
115  -e VERTEX_PROJECT_ID=your-gcp-project-id \
116  -e GCS_BUCKET_NAME=ai-vision-mcp-{VERTEX_PROJECT_ID} \
117  -- npx ai-vision-mcp
118```
119 
120 
121Note: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating `~\.claude\settings.json` as follows:
122 
123```json
124{
125  "env": {
126    "MCP_TIMEOUT": "60000",
127    "MCP_TOOL_TIMEOUT": "300000"
128  }
129}
130```
131 
132</details>
133 
134<details>
135<summary>Cursor</summary>
136 
137Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server
138 
139Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See [Cursor MCP docs](https://docs.cursor.com/context/model-context-protocol) for more info.
140 
141(i) Using Google AI Studio Provider
142```json
143{
144  "mcpServers": {
145    "ai-vision-mcp": {
146      "command": "npx",
147      "args": ["ai-vision-mcp"],
148      "env": {
149        "IMAGE_PROVIDER": "google",
150        "VIDEO_PROVIDER": "google",
151        "GEMINI_API_KEY": "your-gemini-api-key"
152      }
153    }
154  }
155}
156```
157 
158(ii) Using Vertex AI Provider
159```json
160{
161  "mcpServers": {
162    "ai-vision-mcp": {
163      "command": "npx",
164      "args": ["ai-vision-mcp"],
165      "env": {
166        "IMAGE_PROVIDER": "vertex_ai",
167        "VIDEO_PROVIDER": "vertex_ai",
168        "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",
169        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
170        "VERTEX_PROJECT_ID": "your-gcp-project-id",
171        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
172      }
173    }
174  }
175}
176```
177</details>
178 
179 
180<details>
181<summary>Cline</summary>
182 
183Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:
184 
1851. Open Cline and click on the MCP Servers icon in the top navigation bar.
1862. Select the Installed tab, then click Advanced MCP Settings.
1873. In the cline_mcp_settings.json file, add the following configuration:
188 
189(i) Using Google AI Studio Provider
190```json
191{
192  "mcpServers": {
193    "timeout": 300, 
194    "type": "stdio",
195    "ai-vision-mcp": {
196      "command": "npx",
197      "args": ["ai-vision-mcp"],
198      "env": {
199        "IMAGE_PROVIDER": "google",
200        "VIDEO_PROVIDER": "google",
201        "GEMINI_API_KEY": "your-gemini-api-key"
202      }
203    }
204  }
205}
206```
207 
208(ii) Using Vertex AI Provider
209```json
210{
211  "mcpServers": {
212    "ai-vision-mcp": {
213      "timeout": 300,
214      "type": "stdio",
215      "command": "npx",
216      "args": ["ai-vision-mcp"],
217      "env": {
218        "IMAGE_PROVIDER": "vertex_ai",
219        "VIDEO_PROVIDER": "vertex_ai",
220        "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",
221        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
222        "VERTEX_PROJECT_ID": "your-gcp-project-id",
223        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
224      }
225    }
226  }
227}
228```
229</details>
230 
231 
232<details>
233 
234<summary>Other MCP clients</summary>
235 
236The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:
237 
238```bash
239npx ai-vision-mcp
240```
241</details>
242 
243 
244## MCP Tools
245 
246The server provides four main MCP tools:
247 
248### 1) `analyze_image`
249 
250Analyzes an image using AI and returns a detailed description.
251 
252**Parameters:**
253- `imageSource` (string): URL, base64 data, or file path to the image
254- `prompt` (string): Question or instruction for the AI
255- `options` (object, optional): Analysis options including temperature and max tokens
256 
257**Examples:**
258 
2591. **Analyze image from URL:**
260```json
261{
262  "imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",
263  "prompt": "What is this image about? Describe what you see in detail."
264}
265```
266 
2672. **Analyze local image file:**
268```json
269{
270  "imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
271  "prompt": "What is this image about? Describe what you see in detail."
272}
273```
274 
275 
276### 2) `compare_images`
277 
278Compares multiple images using AI and returns a detailed comparison analysis.
279 
280**Parameters:**
281- `imageSources` (array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 images
282- `prompt` (string): Question or instruction for comparing the images
283- `options` (object, optional): Analysis options including temperature and max tokens
284 
285**Examples:**
286 
2871. **Compare images from URLs:**
288```json
289{
290  "imageSources": [
291    "https://example.com/image1.jpg",
292    "https://example.com/image2.jpg"
293  ],
294  "prompt": "Compare these two images and tell me the differences"
295}
296```
297 
2982. **Compare mixed sources:**
299```json
300{
301  "imageSources": [
302    "https://example.com/image1.jpg",
303    "C:\\\\Users\\\\username\\\\Downloads\\\\image2.jpg",
304    "data:image/jpeg;base64,/9j/4AAQSkZJRgAB..."
305  ],
306  "prompt": "Which image has the best lighting quality?"
307}
308```
309 
310### 3) `detect_objects_in_image`
311 
312Detects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory.
313 
314**Parameters:**
315- `imageSource` (string): URL, base64 data, or file path to the image
316- `prompt` (string): Custom detection prompt describing what to detect or recognize in the image
317- `outputFilePath` (string, optional): Explicit output path for the annotated image
318 
319**Configuration:**
320This function uses optimized default parameters for object detection and does not accept runtime `options` parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables:
321 
322```
323# Recommended environment variable settings for object detection (these are now the defaults)
324TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0     # Deterministic responses
325TOP_P_FOR_DETECT_OBJECTS_IN_IMAGE=0.95          # Nucleus sampling
326TOP_K_FOR_DETECT_OBJECTS_IN_IMAGE=30            # Vocabulary selection
327MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192     # High token limit for JSON
328```
329 
330**File Handling Logic:**
3311. **Explicit outputFilePath provided** → Saves to the exact path specified
3322. **If not explicit outputFilePath** → Automatically saves to temporary directory
333 
334**Response Types:**
335- Returns `file` object when explicit outputFilePath is provided
336- Returns `tempFile` object when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folder
337- Always includes `detections` array with detected objects and coordinates
338- Includes `summary` with percentage-based coordinates for browser automation
339 
340**Examples:**
341 
3421. **Basic object detection:**
343```json
344{
345  "imageSource": "https://example.com/image.jpg",
346  "prompt": "Detect all objects in this image"
347}
348```
349 
3502. **Save annotated image to specific path:**
351```json
352{
353  "imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
354  "outputFilePath": "C:\\Users\\username\\Documents\\annotated_image.png"
355}
356```
357 
3583. **Custom detection prompt:**
359```json
360{
361  "imageSource": "data:image/jpeg;base64,/9j/4AAQSkZJRgAB...",
362  "prompt": "Detect and label all electronic devices in this image"
363}
364```
365 
366 
367### 4) `analyze_video`
368 
369Analyzes a video using AI and returns a detailed description.
370 
371**Parameters:**
372- `videoSource` (string): YouTube URL, GCS URI, or local file path to the video
373- `prompt` (string): Question or instruction for the AI
374- `options` (object, optional): Analysis options including temperature and max tokens
375 
376**Supported video sources:**
377- YouTube URLs (e.g., `https://www.youtube.com/watch?v=...`)
378- Local file paths (e.g., `C:\Users\username\Downloads\video.mp4`)
379 
380**Examples:**
381 
3821. **Analyze video from YouTube URL:**
383```json
384{
385  "videoSource": "https://www.youtube.com/watch?v=9hE5-98ZeCg",
386  "prompt": "What is this video about? Describe what you see in detail."
387}
388```
389 
3902. **Analyze local video file:**
391```json
392{
393  "videoSource": "C:\\Users\\username\\Downloads\\video.mp4",
394  "prompt": "What is this video about? Describe what you see in detail."
395}
396```
397 
398**Note:** Only YouTube URLs are supported for public video URLs. Other public video URLs are not currently supported.
399 
400 
401## Environment Configuration
402 
403For basic setup, you only need to configure the provider selection and required credentials:
404 
405### Google AI Studio Provider (Recommended)
406```bash
407export IMAGE_PROVIDER="google"
408export VIDEO_PROVIDER="google"
409export GEMINI_API_KEY="your-gemini-api-key"
410```
411 
412### Vertex AI Provider (Production)
413```bash
414export IMAGE_PROVIDER="vertex_ai"
415export VIDEO_PROVIDER="vertex_ai"
416export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com"
417export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
418export VERTEX_PROJECT_ID="your-gcp-project-id"
419export GCS_BUCKET_NAME="your-gcs-bucket"
420```
421 
422### 📖 **Detailed Configuration Guide**
423 
424For comprehensive environment variable documentation, including:
425- Complete configuration reference (60+ environment variables)
426- Function-specific optimization examples
427- Advanced configuration patterns
428- Troubleshooting guidance
429 
430👉 **[See Environment Variable Guide](docs/environment-variable-guide.md)**
431 
432### Configuration Priority Overview
433 
434The server uses a hierarchical configuration system where more specific settings override general ones:
435 
4361. **LLM-assigned values** (runtime parameters in tool calls)
4372. **Function-specific variables** (`TEMPERATURE_FOR_ANALYZE_IMAGE`, etc.)
4383. **Task-specific variables** (`TEMPERATURE_FOR_IMAGE`, etc.)
4394. **Universal variables** (`TEMPERATURE`, etc.)
4405. **System defaults**
441 
442<details>
443<summary><strong>Quick Configuration Examples</strong></summary>
444 
445**Basic Optimization:**
446```bash
447# General settings
448export TEMPERATURE=0.7
449export MAX_TOKENS=1500
450 
451# Task-specific optimization
452export TEMPERATURE_FOR_IMAGE=0.2     # More precise for images
453export TEMPERATURE_FOR_VIDEO=0.5     # More creative for videos
454```
455 
456**Function-specific Optimization:**
457```bash
458# Optimize individual functions
459export TEMPERATURE_FOR_ANALYZE_IMAGE=0.1
460export TEMPERATURE_FOR_COMPARE_IMAGES=0.3
461export TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0  # Deterministic
462export MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192   # High token limit
463```
464 
465**Model Selection:**
466```bash
467# Choose models per function
468export ANALYZE_IMAGE_MODEL="gemini-2.5-flash-lite"
469export COMPARE_IMAGES_MODEL="gemini-2.5-flash"
470export ANALYZE_VIDEO_MODEL="gemini-2.5-flash-pro"
471```
472</details>
473 
474## Troubleshooting (stdio / Codex / Claude Code)
475 
476### 1) "Transport closed" / tool call fails
477 
478If you see errors like:
479 
480- `tools/call failed: Transport closed`
481 
482Common causes:
483 
484**A) Image annotation dependency failed to load**
485 
486This server uses [`imagescript`](https://github.com/matmen/ImageScript) for image annotation/dimension extraction.
487 
488Verify it loads:
489 
490```bash
491npm run doctor
492# or
493npm run check:imagescript
494```
495 
496**B) stdout logs corrupt stdio MCP framing**
497 
498This server uses the MCP **stdio** transport (newline-delimited JSON-RPC over stdout).
499 
500- ✅ stdout must contain **only** MCP JSON-RPC messages
501- ✅ write logs to **stderr** (e.g. `console.error`)
502- ❌ do not use `console.log` in stdio MCP servers
503 
504If stdout is polluted, clients (Codex/Claude Code) may disconnect and report `Transport closed`.
505 
506## Development
507 
508### Prerequisites
509 
510- Node.js 18+
511- npm or yarn
512 
513### Setup
514 
515```bash
516# Clone the repository
517git clone https://github.com/tan-yong-sheng/ai-vision-mcp.git
518cd ai-vision-mcp
519 
520# Install dependencies
521npm install
522 
523# Build the project
524npm run build
525 
526# Start development server
527npm run dev
528```
529 
530### Scripts
531 
532- `npm run build` - Build the TypeScript project
533- `npm run dev` - Start development server with watch mode
534- `npm run lint` - Run ESLint
535- `npm run format` - Format code with Prettier
536- `npm start` - Start the built server
537 
538## Architecture
539 
540The project follows a modular architecture:
541 
542```
543src/
544├── providers/          # AI provider implementations
545│   ├── gemini/        # Google Gemini provider
546│   ├── vertexai/      # Vertex AI provider
547│   └── factory/       # Provider factory
548├── services/          # Core services
549│   ├── ConfigService.ts
550│   └── FileService.ts
551├── storage/           # Storage implementations
552├── file-upload/       # File upload strategies
553├── types/            # TypeScript type definitions
554├── utils/            # Utility functions
555└── server.ts         # Main MCP server
556```
557 
558## Error Handling
559 
560The server includes comprehensive error handling:
561 
562- **Validation Errors**: Input validation using Zod schemas
563- **Network Errors**: Automatic retries with exponential backoff
564- **Authentication Errors**: Clear error messages for API key issues
565- **File Errors**: Handling for file size limits and format restrictions
566 
567## Contributing
568 
5691. Fork the repository
5702. Create a feature branch (`git checkout -b feature/amazing-feature`)
5713. Commit your changes (`git commit -m 'Add amazing feature'`)
5724. Push to the branch (`git push origin feature/amazing-feature`)
5735. Open a Pull Request
574 
575## License
576 
577This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
578 
579## Acknowledgments
580 
581- Google for the Gemini and Vertex AI APIs
582- The Model Context Protocol team for the MCP framework
583- All contributors and users of this project

Full transparency — inspect the skill content before installing.

New to skill.md files?

See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.

Read the guide →