How do I install Media Gen MCP?

Install Media Gen MCP with a single command: npx mdskills install strato-space/media-gen-mcp. This downloads the skill files into your project and your AI agent picks them up automatically.
What platforms support Media Gen MCP?

Media Gen MCP works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Gemini Cli, Amp, Roo Code, Goose. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.
← Back to MCP servers
Media Gen MCP

Name: Media Gen MCP: AI Agent Skill
Brand: strato-space
Availability: InStock
Rating: 9 (1 reviews)
Author: strato-space
Verified
MCP ServerTesting & QAIntermediate
Media Gen MCP is a strict TypeScript Model Context Protocol (MCP) server for OpenAI Images (gpt-image-1.5, gpt-image-1), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart resourcelink vs inline image outputs and optional sharp processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Wo
by @strato-space 8Updated 2/24/2026
Add this skill
npx mdskills install strato-space/media-gen-mcp
Fork & Edit
Are you @strato-space? Sign in with GitHub to claim this listing.
Skill Advisor9.0
Comprehensive MCP server with full OpenAI Images/Sora and Google Veo support, strict TypeScript, and extensive testing coverage
+Provides complete OpenAI Images, Sora, and Google Veo APIs with 152 passing unit tests
+Implements smart resource_link vs inline switching with automatic compression for large payloads
+Documents all tool parameters thoroughly with clear examples and client integration guides
-Shell execution permission declared but not clearly justified in documentation
SKILL.md
Edit in Browser
1# media-gen-mcp
2 
3<p align="center">
4  <a href="https://www.npmjs.com/package/media-gen-mcp"><img src="https://img.shields.io/npm/v/media-gen-mcp?label=media-gen-mcp&color=brightgreen" alt="media-gen-mcp"></a>
5  <a href="https://www.npmjs.com/package/@modelcontextprotocol/sdk"><img src="https://img.shields.io/npm/v/@modelcontextprotocol/sdk?label=MCP%20SDK&color=blue" alt="MCP SDK"></a>
6  <a href="https://www.npmjs.com/package/openai"><img src="https://img.shields.io/npm/v/openai?label=OpenAI%20SDK&color=blueviolet" alt="OpenAI SDK"></a>
7  <a href="https://github.com/punkpeye/mcp-proxy"><img src="https://img.shields.io/github/stars/punkpeye/mcp-proxy?label=mcp-proxy&style=social" alt="mcp-proxy"></a>
8  <a href="https://github.com/yjacquin/fast-mcp"><img src="https://img.shields.io/github/stars/yjacquin/fast-mcp?label=fast-mcp&style=social" alt="fast-mcp"></a>
9  <a href="https://github.com/strato-space/media-gen-mcp/blob/main/LICENSE"><img src="https://img.shields.io/github/license/strato-space/media-gen-mcp?color=brightgreen" alt="License"></a>
10  <a href="https://github.com/strato-space/media-gen-mcp/stargazers"><img src="https://img.shields.io/github/stars/strato-space/media-gen-mcp?style=social" alt="GitHub stars"></a>
11  <a href="https://github.com/strato-space/media-gen-mcp/actions"><img src="https://img.shields.io/github/actions/workflow/status/strato-space/media-gen-mcp/main.yml?label=build&logo=github" alt="Build Status"></a>
12</p>
13 
14---
15 
16**Media Gen MCP** is a **strict TypeScript** Model Context Protocol (MCP) server for OpenAI Images (`gpt-image-1.5`, `gpt-image-1`), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart `resource_link` vs inline `image` outputs and optional `sharp` processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Works with fast-agent, Claude Desktop, ChatGPT, Cursor, VS Code, Windsurf, and any MCP-compatible client.
17 
18**Design principle:** spec-first, type-safe image tooling – strict OpenAI Images API + MCP compliance with fully static TypeScript types and flexible result placements/response formats for different clients.
19 
20- **Generate images** from text prompts using OpenAI's `gpt-image-1.5` model (with `gpt-image-1` compatibility and DALL·E support planned in future versions).
21- **Edit images** (inpainting, outpainting, compositing) from 1 up to 16 images at once, with advanced prompt control.
22- **Generate videos** via OpenAI Videos (`sora-2`, `sora-2-pro`) with job create/remix/list/retrieve/delete and asset downloads.
23- **Generate videos** via Google GenAI (Veo) with operation polling and file-first downloads.
24- **Fetch & compress images** from HTTP(S) URLs or local file paths with smart size/quality optimization.
25- **Fetch documents** from HTTP(S) URLs or local file paths and return `resource_link`/`resource` outputs.
26- **Debug MCP output shapes** with a `test-images` tool that mirrors production result placement (`content`, `structuredContent`, `toplevel`).
27- **Integrates with**: [fast-agent](https://github.com/strato-space/fast-agent), [Windsurf](https://windsurf.com), [Claude Desktop](https://www.anthropic.com/claude/desktop), [Cursor](https://cursor.com), [VS Code](https://code.visualstudio.com/), and any MCP-compatible client.
28 
29---
30 
31## ✨ Features
32 
33- **Strict MCP spec support**  
34  Tool outputs are first-class [`CallToolResult`](https://github.com/modelcontextprotocol/spec/blob/main/schema/2025-11-25/schema.json) objects from the latest MCP schema, including:
35  `content` items (`text`, `image`, `resource_link`, `resource`), optional `structuredContent`, optional top-level `files`, and the `isError` flag for failures.
36 
37- **Full gpt-image-1.5 and sora-2/sora-2-pro parameters coverage (generate & edit)**  
38  - [`openai-images-generate`](#openai-images-generate) mirrors the OpenAI Images [`create`](https://platform.openai.com/docs/api-reference/images/create) API for `gpt-image-1.5` (and `gpt-image-1`) (background, moderation, size, quality, output_format, output_compression, `n`, `user`, etc.).
39  - [`openai-images-edit`](#openai-images-edit) mirrors the OpenAI Images [`createEdit`](https://platform.openai.com/docs/api-reference/images/createEdit) API for `gpt-image-1.5` (and `gpt-image-1`) (image, mask, `n`, quality, size, `user`).
40 
41- **OpenAI Videos (Sora) job tooling (create / remix / list / retrieve / delete / content)**  
42  - [`openai-videos-create`](#openai-videos-create) mirrors [`videos/create`](https://platform.openai.com/docs/api-reference/videos/create) and can optionally wait for completion.
43  - [`openai-videos-remix`](#openai-videos-remix) mirrors [`videos/remix`](https://platform.openai.com/docs/api-reference/videos/remix).
44  - [`openai-videos-list`](#openai-videos-list) mirrors [`videos/list`](https://platform.openai.com/docs/api-reference/videos/list).
45  - [`openai-videos-retrieve`](#openai-videos-retrieve) mirrors [`videos/retrieve`](https://platform.openai.com/docs/api-reference/videos/retrieve).
46  - [`openai-videos-delete`](#openai-videos-delete) mirrors [`videos/delete`](https://platform.openai.com/docs/api-reference/videos/delete).
47  - [`openai-videos-retrieve-content`](#openai-videos-retrieve-content) mirrors [`videos/content`](https://platform.openai.com/docs/api-reference/videos/content) and downloads `video` / `thumbnail` / `spritesheet` assets to disk, returning MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).
48 
49- **Google GenAI (Veo) operations + downloads (generate / retrieve operation / retrieve content)**  
50  - [`google-videos-generate`](#google-videos-generate) starts a long-running operation (`ai.models.generateVideos`) and can optionally wait for completion and download `.mp4` outputs. [Veo model reference](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/veo-video-generation)
51  - [`google-videos-retrieve-operation`](#google-videos-retrieve-operation) polls an existing operation.
52  - [`google-videos-retrieve-content`](#google-videos-retrieve-content) downloads an `.mp4` from a completed operation, returning MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).
53 
54- **Fetch and process images from URLs or files**  
55  [`fetch-images`](#fetch-images) tool loads images from HTTP(S) URLs or local file paths with optional, user-controlled compression (disabled by default). Supports parallel processing of up to 20 images.
56 
57- **Fetch videos from URLs or files**  
58  [`fetch-videos`](#fetch-videos) tool lists local videos or downloads remote video URLs to disk and returns MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).
59 
60- **Fetch documents from URLs or files**  
61  [`fetch-document`](#fetch-document) tool downloads remote files or reuses local paths and returns MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).
62 
63- **Mix and edit up to 16 images**  
64  [`openai-images-edit`](#openai-images-edit) accepts `image` as a single string or an array of 1–16 file paths/base64 strings, matching the OpenAI spec for GPT Image models (`gpt-image-1.5`, `gpt-image-1`) image edits.
65 
66- **Smart image compression**  
67  Built-in compression using [sharp](https://sharp.pixelplumbing.com/) — iteratively reduces quality and dimensions to fit MCP payload limits while maintaining visual quality.
68 
69- **Resource-aware file output with `resource_link`**  
70  - Automatic switch from inline base64 to `file` when the total response size exceeds a safe threshold.
71  - Outputs are written to disk using `output_<time_t>_media-gen__<tool>_<id>.<ext>` filenames (images/documents use a generated UUID; videos use the OpenAI `video_id`) and exposed to MCP clients via `content[]` depending on `tool_result` (`resource_link`/`image` for images, `resource_link`/`resource` for video/document downloads).
72 
73- **Built-in test-images tool for MCP client debugging**  
74  [`test-images`](#test-images) reads sample images from a configured directory and returns them using the same result-building logic as production tools. Use `tool_result` and `response_format` parameters to test how different MCP clients handle `content[]` and `structuredContent`.
75 
76- **Structured MCP error handling**  
77  All tool errors (validation, OpenAI API failures, I/O) are returned as MCP errors with
78  `isError: true` and `content: [{ type: "text", text: <error message> }]`, making failures easy to parse and surface in MCP clients.
79 
80---
81 
82## 🚀 Installation
83 
84```sh
85git clone https://github.com/strato-space/media-gen-mcp.git
86cd media-gen-mcp
87 
88npm install
89npm run build
90```
91 
92Build modes:
93 
94- `npm run build` – strict TypeScript build with **all strict flags enabled**, including `skipLibCheck: false`. Incremental builds via `.tsbuildinfo` (~2-3s on warm cache).
95- `npm run esbuild` – fast bundling via esbuild (no type checking, useful for rapid iteration).
96 
97### Development mode (no build required)
98 
99For development or when TypeScript compilation fails due to memory constraints:
100 
101```sh
102npm run dev  # Uses tsx to run TypeScript directly
103```
104 
105### Quality checks
106 
107```sh
108npm run lint        # ESLint with typescript-eslint
109npm run typecheck   # Strict tsc --noEmit
110npm run test        # Unit tests (vitest)
111npm run test:watch  # Watch mode for TDD
112npm run ci          # lint + typecheck + test
113```
114 
115### Unit tests
116 
117The project uses [vitest](https://vitest.dev/) for unit testing. Tests are located in `test/`.
118 
119**Covered modules:**
120 
121| Module | Tests | Description |
122|--------|-------|-------------|
123| `compression` | 12 | Image format detection, buffer processing, file I/O |
124| `helpers` | 31 | URL/path validation, output resolution, result placement, resource links |
125| `env` | 19 | Configuration parsing, env validation, defaults |
126| `logger` | 10 | Structured logging + truncation safety |
127| `pricing` | 5 | Sora pricing estimate helpers |
128| `schemas` | 69 | Zod schema validation for all tools, type inference |
129| `fetch-images` (integration) | 3 | End-to-end MCP tool call behavior |
130| `fetch-videos` (integration) | 3 | End-to-end MCP tool call behavior |
131 
132**Test categories:**
133 
134- **compression** — `isCompressionAvailable`, `detectImageFormat`, `processBufferWithCompression`, `readAndProcessImage`
135- **helpers** — `isHttpUrl`, `isAbsolutePath`, `isBase64Image`, `ensureDirectoryWritable`, `resolveOutputPath`, `getResultPlacement`, `buildResourceLinks`
136- **env** — config loading and validation for `MEDIA_GEN_*` / `MEDIA_GEN_MCP_*` settings
137- **logger** — truncation and error formatting behavior
138- **schemas** — validation for `openai-images-*`, `openai-videos-*`, `fetch-images`, `fetch-videos`, `test-images` inputs, boundary testing (prompt length, image count limits, path validation)
139 
140```sh
141npm run test
142# ✓ test/compression.test.ts (12 tests)
143# ✓ test/helpers.test.ts (31 tests)
144# ✓ test/env.test.ts (19 tests)
145# ✓ test/logger.test.ts (10 tests)
146# ✓ test/pricing.test.ts (5 tests)
147# ✓ test/schemas.test.ts (69 tests)
148# ✓ test/fetch-images.integration.test.ts (3 tests)
149# ✓ test/fetch-videos.integration.test.ts (3 tests)
150# Tests: 152 passed
151```
152 
153### Run directly via npx (no local clone)
154 
155You can also run the server straight from a remote repo using `npx`:
156 
157```sh
158npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env
159```
160 
161The `--env-file` argument tells the server which env file to load (e.g. when you keep secrets outside the cloned directory). The file should contain `OPENAI_API_KEY`, optional Azure variables, and any `MEDIA_GEN_MCP_*` settings.
162 
163### `secrets.yaml` (optional)
164 
165You can keep API keys (and optional Google Vertex AI settings) in a `secrets.yaml` file (compatible with the fast-agent secrets template):
166 
167```yaml
168openai:
169  api_key: <your-api-key-here>
170anthropic:
171  api_key: <your-api-key-here>
172google:
173  api_key: <your-api-key-here>
174  vertex_ai:
175    enabled: true
176    project_id: your-gcp-project-id
177    location: europe-west4
178```
179 
180`media-gen-mcp` loads `secrets.yaml` from the current working directory (or from `--secrets-file /path/to/secrets.yaml`) and applies it to env vars; values in `secrets.yaml` override env, and `<your-api-key-here>` placeholders are ignored.
181 
182---
183 
184## ⚡ Quick start (fast-agent & Windsurf)
185 
186### fast-agent
187 
188In fast-agent, MCP servers are configured in `fastagent.config.yaml` under the `mcp.servers` section (see the [fast-agent docs](https://github.com/strato-space/fast-agent)).
189 
190To add `media-gen-mcp` from GitHub via `npx` as an MCP server:
191 
192```yaml
193# fastagent.config.yaml
194 
195mcp:
196  servers:
197    # your existing servers (e.g. fetch, filesystem, huggingface, ...)
198    media-gen-mcp:
199      command: "npx"
200      args: ["-y", "github:strato-space/media-gen-mcp", "--env-file", "/path/to/media-gen.env"]
201```
202 
203Put `OPENAI_API_KEY` and other settings into `media-gen.env` (see `.env.sample` in this repo).
204 
205### Windsurf
206 
207Add an MCP server that runs `media-gen-mcp` from GitHub via `npx` using the JSON format below (similar to Claude Desktop / VS Code):
208 
209```json
210{
211  "mcpServers": {
212    "media-gen-mcp": {
213      "command": "npx",
214      "args": ["-y", "github:strato-space/media-gen-mcp", "--env-file", "/path/to/media-gen.env"]
215    }
216  }
217}
218```
219 
220---
221 
222## 🔑 Configuration
223 
224Add to your MCP client config (fast-agent, Windsurf, Claude Desktop, Cursor, VS Code):
225 
226```json
227{
228  "mcpServers": {
229    "media-gen-mcp": {
230      "command": "npx",
231      "args": ["-y", "github:strato-space/media-gen-mcp"],
232      "env": { "OPENAI_API_KEY": "sk-..." }
233    }
234  }
235}
236```
237 
238Also supports Azure deployments:
239 
240```json
241{
242  "mcpServers": {
243    "media-gen-mcp": {
244      "command": "npx",
245      "args": ["-y", "github:strato-space/media-gen-mcp"],
246      "env": {
247        // "AZURE_OPENAI_API_KEY": "sk-...",
248        // "AZURE_OPENAI_ENDPOINT": "my.endpoint.com",
249        "OPENAI_API_VERSION": "2024-12-01-preview"
250      }
251    }
252  }
253}
254```
255 
256Environment variables:
257 
258- Set `OPENAI_API_KEY` (and optionally `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `OPENAI_API_VERSION`) in the environment of the process that runs `node dist/index.js` (shell, systemd unit, Docker env, etc.).
259- The server will **optionally** load a local `.env` file from its working directory if present (it does not override already-set environment variables).
260- You can also pass `--env-file /path/to/env` when starting the server (including via `npx`); this file is loaded via `dotenv` before tools run, again without overriding already-set variables.
261 
262### Logging and base64 truncation
263 
264To avoid flooding logs with huge image payloads, the built-in logger applies a
265log-only sanitizer to structured `data` passed to `log.debug/info/warn/error`:
266 
267- Truncates configured string fields (e.g. `b64_json`, `base64`, string
268  `data`, `image_url`) to a short preview controlled by
269  `LOG_TRUNCATE_DATA_MAX` (default: 64 characters). The list of keys defaults
270  to `LOG_SANITIZE_KEYS` inside `src/lib/logger.ts` and can be overridden via
271  `MEDIA_GEN_MCP_LOG_SANITIZE_KEYS` (comma-separated list of field names).
272- Sanitization is applied **only** to log serialization; tool results returned
273  to MCP clients are never modified.
274 
275Control via environment:
276 
277- `MEDIA_GEN_MCP_LOG_SANITIZE_IMAGES` (default: `true`)
278  - `1`, `true`, `yes`, `on` – enable truncation (default behaviour).
279  - `0`, `false`, `no`, `off` – disable truncation and log full payloads.
280 
281Field list and limits are configured in `src/lib/logger.ts` via
282`LOG_SANITIZE_KEYS` and `LOG_TRUNCATE_DATA_MAX`.
283 
284### Security and local file access
285 
286- **Allowed directories**: All tools are restricted to paths matching `MEDIA_GEN_DIRS`. If unset, defaults to `/tmp/media-gen-mcp` (or `%TEMP%/media-gen-mcp` on Windows).
287- **Test samples**: `MEDIA_GEN_MCP_TEST_SAMPLE_DIR` adds a directory to the allowlist and enables the `test-images` tool.
288- **Local reads**: `fetch-images` and `fetch-document` accept file paths (absolute or relative). Relative paths are resolved against the first `MEDIA_GEN_DIRS` entry and must still match an allowed pattern.
289- **Remote reads**: HTTP(S) fetches are filtered by `MEDIA_GEN_URLS` patterns. Empty = allow all.
290- **Writes**: `openai-images-generate`, `openai-images-edit`, `fetch-images`, `fetch-videos`, and `fetch-document` write under the first entry of `MEDIA_GEN_DIRS`. `test-images` is read-only and does not create new files.
291 
292#### Glob patterns
293 
294Both `MEDIA_GEN_DIRS` and `MEDIA_GEN_URLS` support glob wildcards:
295 
296| Pattern | Matches | Example |
297|---------|---------|---------|
298| `*` | Any single segment (no `/`) | `/home/*/media/` matches `/home/user1/media/` |
299| `**` | Any number of segments | `/data/**/images/` matches `/data/a/b/images/` |
300 
301URL examples:
302```shell
303MEDIA_GEN_URLS=https://*.cdn.example.com/,https://storage.example.com/**/assets/
304```
305 
306Path examples:
307```shell
308MEDIA_GEN_DIRS=/home/*/media-gen/output/,/data/**/images/
309```
310 
311⚠️ **Warning**: Trailing wildcards without a delimiter (e.g., `/home/user/*` or `https://cdn.com/**`) expose entire subtrees and trigger a console warning at startup.
312 
313#### Recommended mitigations
314 
3151. Run under a dedicated OS user with access only to allowed directories.
3162. Keep allowlists minimal. Avoid `*` in home directories or system paths.
3173. Use explicit `MEDIA_GEN_URLS` prefixes for remote fetches.
3184. Monitor allowed directories via OS ACLs or backups.
319 
320### Tool Result Parameters: `tool_result` and `response_format`
321 
322Image tools (`openai-images-*`, `fetch-images`, `test-images`) support two parameters that control the shape of the MCP tool result:
323 
324| Parameter | Values | Default | Description |
325|-----------|--------|---------|-------------|
326| `tool_result` | `resource_link`, `image` | `resource_link` | Controls `content[]` shape |
327| `response_format` | `url`, `path`, `b64_json` | `url` | Controls `structuredContent` shape (OpenAI ImagesResponse format) |
328 
329Video/document download tools (`openai-videos-create` / `openai-videos-remix` when downloading, `openai-videos-retrieve-content`, `google-videos-generate` when downloading, `google-videos-retrieve-content`, `fetch-videos`, `fetch-document`) support:
330 
331| Parameter | Values | Default | Description |
332|-----------|--------|---------|-------------|
333| `tool_result` | `resource_link`, `resource` | `resource_link` | Controls `content[]` shape |
334 
335Google video tools (`google-videos-*`) also support:
336 
337| Parameter | Values | Default | Description |
338|-----------|--------|---------|-------------|
339| `response_format` | `url`, `b64_json` | `url` | Controls `structuredContent.response.generatedVideos[].video` shape (`uri` vs `videoBytes`) |
340 
341#### `tool_result` — controls `content[]`
342 
343- **Images** (`openai-images-*`, `fetch-images`, `test-images`)
344  - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs
345  - **`image`**: Emits base64 `ImageContent` blocks
346- **Videos** (tools that download video data)
347  - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs
348  - **`resource`**: Emits `EmbeddedResource` blocks with base64 `resource.blob`
349- **Documents** (`fetch-document`)
350  - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs
351  - **`resource`**: Emits `EmbeddedResource` blocks with base64 `resource.blob`
352 
353#### `response_format` — controls `structuredContent`
354 
355For OpenAI images, `structuredContent` always contains an OpenAI ImagesResponse-style object:
356 
357```jsonc
358{
359  "created": 1234567890,
360  "data": [
361    { "url": "https://..." } // or { "path": "/abs/path.png" } / { "b64_json": "..." } depending on response_format
362  ]
363}
364```
365 
366- **`url`** (default): `data[].url` contains file URLs
367- **`path`**: `data[].path` contains local filesystem paths
368- **`b64_json`**: `data[].b64_json` contains base64-encoded image data
369 
370For Google videos, `response_format` controls whether `structuredContent.response.generatedVideos[].video` prefers:
371 
372- **`url`** (default): `video.uri` (and strips `video.videoBytes`)
373- **`b64_json`**: `video.videoBytes` (and strips `video.uri`)
374 
375#### Backward Compatibility (MCP 5.2.6)
376 
377Per MCP spec 5.2.6, a `TextContent` block with serialized JSON (always using URLs in `data[]`) is also included in `content[]` for backward compatibility with clients that don't support `structuredContent`.
378 
379Example tool result structure:
380 
381```jsonc
382{
383  "content": [
384    // ResourceLink or ImageContent based on tool_result
385    { "type": "resource_link", "uri": "https://...", "name": "image.png", "mimeType": "image/png" },
386    // Serialized JSON for backward compatibility (MCP 5.2.6)
387    { "type": "text", "text": "{ \"created\": 1234567890, \"data\": [{ \"url\": \"https://...\" }] }" }
388  ],
389  "structuredContent": {
390    "created": 1234567890,
391    "data": [{ "url": "https://..." }]
392  }
393}
394```
395 
396**ChatGPT MCP client behavior (chatgpt.com, as of 2025-12-01):**
397 
398- ChatGPT currently ignores `content[]` image data in favor of `structuredContent`.
399- For ChatGPT, use `response_format: "url"` and configure the first `MEDIA_GEN_MCP_URL_PREFIXES` entry as a public HTTPS prefix (for example `MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media`).
400 
401For Anthropic clients (Claude Desktop, etc.), the default configuration works well.
402 
403### Network access via mcp-proxy (SSE)
404 
405For networked SSE access you can front `media-gen-mcp` with [`mcp-proxy`](https://github.com/modelcontextprotocol/servers/tree/main/src/proxy) or its equivalent. This setup has been tested with the TypeScript SSE proxy implementation [`punkpeye/mcp-proxy`](https://github.com/punkpeye/mcp-proxy).
406 
407For example, a one-line command looks like:
408 
409```sh
410mcp-proxy --host=0.0.0.0 --port=99 --server=sse --sseEndpoint=/ --shell 'npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env'
411```
412 
413In production you would typically wire this up via a systemd template unit that loads `PORT`/`SHELL_CMD` from an `EnvironmentFile=` (see `server/mcp/mcp@.service` style setups).
414 
415---
416 
417## 🛠 Tool signatures
418 
419### openai-images-generate
420 
421Arguments (input schema):
422 
423- `prompt` (string, required)
424  - Text prompt describing the desired image.
425  - Max length: 32,000 characters.
426- `background` ("transparent" | "opaque" | "auto", optional)
427  - Background handling mode.
428  - If `background` is `"transparent"`, then `output_format` must be `"png"` or `"webp"`.
429- `model` ("gpt-image-1.5" | "gpt-image-1", optional, default: "gpt-image-1.5")
430- `moderation` ("auto" | "low", optional)
431  - Content moderation behavior, passed through to the Images API.
432- `n` (integer, optional)
433  - Number of images to generate.
434  - Min: 1, Max: 10.
435- `output_compression` (integer, optional)
436  - Compression level (0–100).
437  - Only applied when `output_format` is `"jpeg"` or `"webp"`.
438- `output_format` ("png" | "jpeg" | "webp", optional)
439  - Output image format.
440  - If omitted, the server treats output as PNG semantics.
441- `quality` ("auto" | "high" | "medium" | "low", default: "high")
442- `size` ("1024x1024" | "1536x1024" | "1024x1536" | "auto", default: "1024x1536")
443- `user` (string, optional)
444  - User identifier forwarded to OpenAI for monitoring.
445- `response_format` ("url" | "path" | "b64_json", default: "url")
446  - Response format (aligned with OpenAI Images API):
447    - `"url"`: file/URL-based output (resource_link items, `image_url` fields, `data[].url` in `api` placement).
448    - `"path"`: local filesystem paths in `data[].path` (for local skill workflows).
449    - `"b64_json"`: inline base64 image data (image content, `data[].b64_json` in `api` placement).
450  - `tool_result` ("resource_link" | "image", default: "resource_link")
451    - Controls `content[]` shape:
452      - `"resource_link"` emits ResourceLink items (file/URL-based)
453      - `"image"` emits base64 ImageContent blocks
454 
455Behavior notes:
456 
457- The server uses OpenAI `gpt-image-1.5` by default (set `model: "gpt-image-1"` for legacy behavior).
458- If the total size of all base64 images would exceed the configured payload
459  threshold (default ~50MB via `MCP_MAX_CONTENT_BYTES`), the server
460  automatically switches the **effective output mode** to file/URL-based and saves
461  images to the first entry of `MEDIA_GEN_DIRS` (default: `/tmp/media-gen-mcp`).
462- Even when you explicitly request `response_format: "b64_json"`, the server still writes
463  the files to disk (for static hosting, caching, or later reuse). Exposure of
464  file paths / URLs in the tool result then depends on `MEDIA_GEN_MCP_RESULT_PLACEMENT`
465  and per-call `result_placement` (see section below).
466 
467Output (MCP CallToolResult, when placement includes `"content"`):
468 
469- When the effective `output` mode is `"base64"`:
470  - `content` is an array that may contain:
471    - image items:
472      - `{ type: "image", data: <base64 string>, mimeType: <"image/png" | "image/jpeg" | "image/webp"> }`
473    - optional text items with revised prompts returned by the Images API (for models that support it, e.g. DALL·E 3):
474      - `{ type: "text", text: <revised_prompt string> }`
475- When the effective `output` mode is `"file"`:
476  - `content` contains one `resource_link` item per file, plus the same optional `text` items with revised prompts:
477    - `{ type: "resource_link", uri: "file:///absolute-path-1.png", name: "absolute-path-1.png", mimeType: <image mime> }`
478  - For `gpt-image-1.5` and `gpt-image-1`, an additional `text` line is included with a pricing estimate (based on `structuredContent.usage`), and `structuredContent.pricing` contains the full pricing breakdown.
479 
480When `result_placement` includes `"api"`, `openai-images-generate` instead returns an **OpenAI Images API-like object** without MCP wrappers:
481 
482```jsonc
483{
484  "created": 1764599500,
485  "data": [
486    { "b64_json": "..." } // or { "url": "https://.../media/file.png" } when output: "file"
487  ],
488  "background": "opaque",
489  "output_format": "png",
490  "size": "1024x1024",
491  "quality": "high"
492}
493```
494 
495### openai-images-edit
496 
497Arguments (input schema):
498 
499- `image` (string or string[], required)
500  - Either a single absolute path to an image file (`.png`, `.jpg`, `.jpeg`, `.webp`),
501    a base64-encoded image string (optionally as a `data:image/...;base64,...` URL),
502    **or an HTTP(S) URL** pointing to a publicly accessible image,
503    **or** an array of 1–16 such strings (for multi-image editing).
504  - When an HTTP(S) URL is provided, the server fetches the image and converts it to base64 before sending to OpenAI.
505- `prompt` (string, required)
506  - Text description of the desired edit.
507  - Max length: 32,000 characters.
508- `mask` (string, optional)
509  - Absolute path, base64 string, or HTTP(S) URL for a mask image (PNG < 4MB, same dimensions
510    as the source image). Transparent areas mark regions to edit.
511- `model` ("gpt-image-1.5" | "gpt-image-1", optional, default: "gpt-image-1.5")
512- `n` (integer, optional)
513  - Number of images to generate.
514  - Min: 1, Max: 10.
515- `quality` ("auto" | "high" | "medium" | "low", default: "high")
516- `size` ("1024x1024" | "1536x1024" | "1024x1536" | "auto", default: "1024x1536")
517- `user` (string, optional)
518  - User identifier forwarded to OpenAI for monitoring.
519- `response_format` ("url" | "path" | "b64_json", default: "url")
520  - Response format (aligned with OpenAI Images API):
521    - `"url"`: file/URL-based output (resource_link items, `image_url` fields, `data[].url` in `api` placement).
522    - `"path"`: local filesystem paths in `data[].path` (for local skill workflows).
523    - `"b64_json"`: inline base64 image data (image content, `data[].b64_json` in `api` placement).
524- `tool_result` ("resource_link" | "image", default: "resource_link")
525  - Controls `content[]` shape:
526    - `"resource_link"` emits ResourceLink items (file/URL-based)
527    - `"image"` emits base64 ImageContent blocks
528 
529Behavior notes:
530 
531- The server accepts `image` and `mask` as absolute paths, base64/data URLs, or HTTP(S) URLs.
532- When an HTTP(S) URL is provided, the server fetches the image and converts it to a base64 data URL before calling OpenAI.
533- For edits, the server always returns PNG semantics (mime type `image/png`)
534  when emitting images.
535 
536Output (MCP CallToolResult):
537 
538- When the effective `output` mode is `"base64"`:
539  - `content` is an array that may contain:
540    - image items:
541      - `{ type: "image", data: <base64 string>, mimeType: "image/png" }`
542    - optional text items with revised prompts (when the underlying model returns them):
543      - `{ type: "text", text: <revised_prompt string> }`
544- When the effective `output` mode is `"file"`:
545  - `content` contains one `resource_link` item per file, plus the same optional `text` items with revised prompts:
546    - `{ type: "resource_link", uri: "file:///absolute-path-1.png", name: "absolute-path-1.png", mimeType: "image/png" }`
547  - For `gpt-image-1.5` and `gpt-image-1`, an additional `text` line is included with a pricing estimate (based on `structuredContent.usage`), and `structuredContent.pricing` contains the full pricing breakdown.
548 
549When `result_placement` includes `"api"`, `openai-images-edit` follows the **same raw API format** as `openai-images-generate` (top-level `created`, `data[]`, `background`, `output_format`, `size`, `quality` with `b64_json` for base64 output or `url` for file output).
550 
551Error handling (both tools):
552 
553- On errors inside the tool handler (validation, OpenAI API failures, I/O, etc.), the server returns a CallToolResult marked as an error:
554  - `isError: true`
555  - `content: [{ type: "text", text: <error message string> }]`
556- The error message text is taken directly from the underlying exception message, without additional commentary from the server, while full details are logged to the server console.
557 
558### openai-videos-create
559 
560Create a video generation job using the OpenAI Videos API (`videos.create`).
561 
562Arguments (input schema):
563 
564- `prompt` (string, required) — text prompt describing the video (max 32K chars).
565- `input_reference` (string, optional) — optional image reference (HTTP(S) URL, base64/data URL, or file path).
566- `input_reference_fit` ("match" | "cover" | "contain" | "stretch", default: "contain")
567  - How to fit `input_reference` to the requested video `size`:
568    - `match`: require exact dimensions (fails fast on mismatch)
569    - `cover`: resize + center-crop to fill
570    - `contain`: resize + pad/letterbox to fit (default)
571    - `stretch`: resize with distortion
572- `input_reference_background` ("blur" | "black" | "white" | "#RRGGBB" | "#RRGGBBAA", default: "blur")
573  - Padding background used when `input_reference_fit="contain"`.
574- `model` ("sora-2" | "sora-2-pro", default: "sora-2-pro")
575- `seconds` ("4" | "8" | "12", optional)
576- `size` ("720x1280" | "1280x720" | "1024x1792" | "1792x1024", optional)
577  - `1024x1792` and `1792x1024` require `sora-2-pro`.
578  - If `input_reference` is omitted and `size` is omitted, the API default is used.
579- `wait_for_completion` (boolean, default: true)
580  - When true, the server polls `openai-videos-retrieve` until `completed` or `failed` (or timeout), then downloads assets.
581- `timeout_ms` (integer, default: 900000)
582- `poll_interval_ms` (integer, default: 2000)
583- `download_variants` (string[], default: ["video"])
584  - Allowed values: `"video" | "thumbnail" | "spritesheet"`.
585- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)
586  - Controls `content[]` shape for downloaded assets:
587    - `"resource_link"` emits ResourceLink items (file/URL-based)
588    - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob`
589 
590Output (MCP CallToolResult):
591 
592- `structuredContent`: OpenAI `Video` object (job metadata; final state when `wait_for_completion=true`).
593- `content`: includes `resource_link` (default) or embedded `resource` blocks for downloaded assets (when requested) and text blocks with JSON.
594  - Includes a summary JSON block: `{ "video_id": "...", "pricing": { "currency": "USD", "model": "...", "size": "...", "seconds": 4, "price": 0.1, "cost": 0.4 } | null }` (and when waiting: `{ "video_id": "...", "assets": [...], "pricing": ... }`).
595 
596### openai-videos-remix
597 
598Create a remix job from an existing `video_id` (`videos.remix`).
599 
600Arguments (input schema):
601 
602- `video_id` (string, required)
603- `prompt` (string, required)
604- `wait_for_completion`, `timeout_ms`, `poll_interval_ms`, `download_variants`, `tool_result` — same semantics as `openai-videos-create` (default wait is true).
605 
606### openai-videos-list
607 
608List video jobs (`videos.list`).
609 
610Arguments (input schema):
611 
612- `after` (string, optional) — cursor (video id) to list after.
613- `limit` (integer, optional)
614- `order` ("asc" | "desc", optional)
615 
616Output:
617 
618- `structuredContent`: OpenAI list response shape `{ data, has_more, last_id }`.
619- `content`: a text block with serialized JSON.
620 
621### openai-videos-retrieve
622 
623Retrieve job status (`videos.retrieve`).
624 
625- `video_id` (string, required)
626 
627### openai-videos-delete
628 
629Delete a video job (`videos.delete`).
630 
631- `video_id` (string, required)
632 
633### openai-videos-retrieve-content
634 
635Retrieve an asset for a completed job (`videos.downloadContent`, REST `GET /videos/{video_id}/content`), write it under allowed `MEDIA_GEN_DIRS`, and return MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).
636 
637Arguments (input schema):
638 
639- `video_id` (string, required)
640- `variant` ("video" | "thumbnail" | "spritesheet", default: "video")
641- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)
642 
643Output (MCP CallToolResult):
644 
645- `structuredContent`: OpenAI `Video` object.
646- `content`: a `resource_link` (or embedded `resource`), a summary JSON block `{ video_id, variant, uri, pricing }`, plus the full video JSON.
647 
648### google-videos-generate
649 
650Create a Google video generation operation using the Google GenAI SDK (`@google/genai`) `ai.models.generateVideos`.
651 
652Arguments (input schema):
653 
654- `prompt` (string, optional)
655- `input_reference` (string, optional) — image-to-video input (HTTP(S) URL, base64/data URL, or file path under `MEDIA_GEN_DIRS`)
656- `input_reference_mime_type` (string, optional) — override for `input_reference` MIME type (must be `image/*`)
657- `input_video_reference` (string, optional) — video-extension input (HTTP(S) URL or file path under `MEDIA_GEN_DIRS`; mutually exclusive with `input_reference`)
658- `model` (string, default: `"veo-3.1-generate-001"`)
659- `number_of_videos` (integer, default: `1`)
660- `aspect_ratio` (`"16:9" | "9:16"`, optional)
661- `duration_seconds` (integer, optional)
662  - Veo 2 models: 5–8 seconds (default: 8)
663  - Veo 3 models: 4, 6, or 8 seconds (default: 8)
664  - When using `referenceImages`: 8 seconds
665- `person_generation` (`"DONT_ALLOW" | "ALLOW_ADULT" | "ALLOW_ALL"`, optional)
666- `wait_for_completion` (boolean, default: `true`)
667- `timeout_ms` (integer, default: `900000`)
668- `poll_interval_ms` (integer, default: `10000`)
669- `download_when_done` (boolean, optional; defaults to `true` when waiting)
670- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)
671  - Controls `content[]` shape when downloading generated videos.
672- `response_format` (`"url"` | `"b64_json"`, default: `"url"`)
673  - Controls `structuredContent.response.generatedVideos[].video` fields:
674    - `"url"` prefers `video.uri` (and strips `video.videoBytes`)
675    - `"b64_json"` prefers `video.videoBytes` (and strips `video.uri`)
676 
677Requirements:
678 
679- Gemini Developer API: set `GEMINI_API_KEY` (or `GOOGLE_API_KEY`), or `google.api_key` in `secrets.yaml`.
680- Vertex AI: set `GOOGLE_GENAI_USE_VERTEXAI=true`, `GOOGLE_CLOUD_PROJECT`, and `GOOGLE_CLOUD_LOCATION` (or `google.vertex_ai.*` in `secrets.yaml`).
681 
682Output:
683 
684- `structuredContent`: Google operation object (includes `name`, `done`, and `response.generatedVideos[]` when available).
685- `content`: status text, optional `.mp4` `resource_link` (default) or embedded `resource` blocks (when downloaded), plus JSON text blocks for compatibility.
686 
687### google-videos-retrieve-operation
688 
689Retrieve/poll an existing Google video operation (`ai.operations.getVideosOperation`).
690 
691- `operation_name` (string, required)
692- `response_format` (`"url"` | `"b64_json"`, default: `"url"`)
693 
694Output:
695 
696- `structuredContent`: Google operation object.
697- `content`: JSON text blocks with a short summary + the full operation.
698 
699### google-videos-retrieve-content
700 
701Download `.mp4` content for a completed operation and return file-first MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).
702 
703- `operation_name` (string, required)
704- `index` (integer, default: `0`) — selects `response.generatedVideos[index]`
705- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)
706- `response_format` (`"url"` | `"b64_json"`, default: `"url"`)
707 
708Recommended workflow:
709 
7101) Call `google-videos-generate` with `wait_for_completion=true` (default) to get the completed operation and downloads; set to false only if you need the operation id immediately.
7112) Poll `google-videos-retrieve-operation` until `done=true`.
7123) Call `google-videos-retrieve-content` to download an `.mp4` and receive a `resource_link` (or embedded `resource`).
713 
714### fetch-images
715 
716Fetch and process images from URLs or local file paths with optional compression.
717 
718Arguments (input schema):
719 
720- `sources` (string[], optional)
721  - Array of image sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry).
722  - Min: 1, Max: 20 images.
723  - Mutually exclusive with `ids` and `n`.
724- `ids` (string[], optional)
725  - Array of image IDs to fetch by local filename match under the primary `MEDIA_GEN_DIRS[0]` directory.
726  - IDs must be safe (`[A-Za-z0-9_-]` only; no `..`, `*`, `?`, slashes).
727  - Matches filenames containing `_{id}_` or `_{id}.` (supports both single outputs and multi-output suffixes like `_1.png`).
728  - When `ids` is used, `compression` and `file` are not supported (no new files are created).
729  - Mutually exclusive with `sources` and `n`.
730- `n` (integer, optional)
731  - When set, returns the last N image files from the primary `MEDIA_GEN_DIRS[0]` directory.
732  - Files are sorted by modification time (most recently modified first).
733  - Mutually exclusive with `sources` and `ids`.
734- `compression` (object, optional)
735  - `max_size` (integer, optional): Max dimension in pixels. Images larger than this will be resized.
736  - `max_bytes` (integer, optional): Target max file size in bytes. Default: 819200 (800KB).
737  - `quality` (integer, optional): JPEG/WebP quality 1-100. Default: 85.
738  - `format` ("jpeg" | "png" | "webp", optional): Output format. Default: jpeg.
739- `response_format` ("url" | "path" | "b64_json", default: "url")
740  - Response format: file/URL-based (`url`), local path (`path`), or inline base64 (`b64_json`).
741- `tool_result` ("resource_link" | "image", default: "resource_link")
742  - Controls `content[]` shape:
743    - `"resource_link"` emits ResourceLink items (file/URL-based)
744    - `"image"` emits base64 ImageContent blocks
745- `file` (string, optional)
746  - Base path for output files. If multiple images, index suffix is added.
747 
748Behavior notes:
749 
750- Images are processed in parallel for maximum throughput.
751- Compression is **only** applied when `compression` options are provided.
752- Compression uses [sharp](https://sharp.pixelplumbing.com/) with iterative quality/size reduction when enabled.
753- Partial success: if some sources fail, successful images are still returned with errors listed in the response.
754- When `n` is provided, it is only honored when the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_IMAGES` environment variable is set to `true`. Otherwise, the call fails with a validation error.
755- Sometimes an MCP client (for example, ChargeGPT) may not wait for a response from `media-gen-mcp` due to a timeout. In creative environments where you need to quickly retrieve the latest `openai-images-generate` / `openai-images-edit` outputs, you can use `fetch-images` with the `n` argument. When the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_IMAGES=true` environment variable is set, `fetch-images` will return the last N files from `MEDIA_GEN_DIRS[0]` even if the original generation or edit operation timed out on the MCP client side.
756 
757### fetch-videos
758 
759Fetch videos from HTTP(S) URLs or local file paths.
760 
761Arguments (input schema):
762 
763- `sources` (string[], optional)
764  - Array of video sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry).
765  - Min: 1, Max: 20 videos.
766  - Mutually exclusive with `ids` and `n`.
767- `ids` (string[], optional)
768  - Array of video IDs to fetch by local filename match under the primary `MEDIA_GEN_DIRS[0]` directory.
769  - IDs must be safe (`[A-Za-z0-9_-]` only; no `..`, `*`, `?`, slashes).
770  - Matches filenames containing `_{id}_` or `_{id}.` (supports both single outputs and multi-asset suffixes like `_thumbnail.webp`).
771  - When `ids` is used, `file` is not supported (no downloads; returns existing files).
772  - Mutually exclusive with `sources` and `n`.
773- `n` (integer, optional)
774  - When set, returns the last N video files from the primary `MEDIA_GEN_DIRS[0]` directory.
775  - Files are sorted by modification time (most recently modified first).
776  - Mutually exclusive with `sources` and `ids`.
777- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)
778  - Controls `content[]` shape:
779    - `"resource_link"` emits ResourceLink items (file/URL-based)
780    - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob`
781- `file` (string, optional)
782  - Base path for output files (used when downloading from URLs). If multiple videos are downloaded, an index suffix is added.
783 
784Output:
785 
786- `content`: one `resource_link` (default) or embedded `resource` block per resolved video, plus an optional error summary text block.
787- `structuredContent`: `{ data: [{ source, uri, file, mimeType, name, downloaded }], errors?: string[] }`.
788 
789Behavior notes:
790 
791- URL downloads are only allowed when the URL matches `MEDIA_GEN_URLS` (when set).
792- When `n` is provided, it is only honored when the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_VIDEOS` environment variable is set to `true`. Otherwise, the call fails with a validation error.
793 
794### fetch-document
795 
796Fetch documents from HTTP(S) URLs or local file paths.
797 
798Arguments (input schema):
799 
800- `sources` (string[])
801  - Array of document sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry).
802  - Min: 1, Max: 20 documents.
803- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)
804  - Controls `content[]` shape:
805    - `"resource_link"` emits ResourceLink items (file/URL-based)
806    - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob`
807- `file` (string, optional)
808  - Base path for output files (used when downloading from URLs). If multiple documents are downloaded, an index suffix is added.
809 
810Output:
811 
812- `content`: one `resource_link` (default) or embedded `resource` block per resolved document, plus an optional error summary text block.
813- `structuredContent`: `{ data: [{ source, uri, file, mimeType, name, downloaded }], errors?: string[] }`.
814 
815Behavior notes:
816 
817- URL downloads are only allowed when the URL matches `MEDIA_GEN_URLS` (when set).
818- Local paths are validated against `MEDIA_GEN_DIRS` and can be provided as `file://` URLs.
819- Default filenames use `output_<time_t>_media-gen__fetch-document_<uuid>.<ext>` when `file` is omitted.
820 
821### test-images
822 
823Debug tool for testing MCP result placement without calling OpenAI API.
824 
825**Enabled only when `MEDIA_GEN_MCP_TEST_SAMPLE_DIR` is set**. The tool reads existing images from this directory and does **not** create new files.
826 
827Arguments (input schema):
828 
829- `response_format` ("url" | "path" | "b64_json", default: "url")
830- `result_placement` ("content" | "api" | "structured" | "toplevel" or array of these, optional)
831  - Override `MEDIA_GEN_MCP_RESULT_PLACEMENT` for this call.
832- `compression` (object, optional)
833  - Same logical tuning knobs as `fetch-images`, but using camelCase keys:
834- `tool_result` ("resource_link" | "image", default: "resource_link")
835  - Controls `content[]` shape:
836    - `"resource_link"` emits ResourceLink items (file/URL-based)
837    - `"image"` emits base64 ImageContent blocks
838    - `maxSize` (integer, optional): max dimension in pixels.
839    - `maxBytes` (integer, optional): target max file size in bytes.
840    - `quality` (integer, optional): JPEG/WebP quality 1–100.
841    - `format` ("jpeg" | "png" | "webp", optional): output format.
842 
843Behavior notes:
844 
845- Reads up to 10 images from the sample directory (no sorting — filesystem order).
846- Uses the same result-building logic as `openai-images-generate` and `openai-images-edit` (including `result_placement` overrides).
847- When `output == "base64"` and `compression` is provided, sample files are read and compressed **in memory** using `sharp`; original files on disk are never modified.
848- Useful for testing how different MCP clients handle various result structures.
849 
850- When `result_placement` includes `"api"`, the tool returns a **mock OpenAI Images API-style object**:
851  - Top level: `created`, `data[]`, `background`, `output_format`, `size`, `quality`.
852  - For `response_format: "b64_json"` each `data[i]` contains `b64_json`.
853  - For `response_format: "path"` each `data[i]` contains `path`.
854  - For `response_format: "url"` each `data[i]` contains `url` instead of `b64_json`.
855 
856#### Debug CLI helpers for `test-images`
857 
858For local debugging there are two helper scripts that call `test-images` directly:
859 
860- `npm run test-images` – uses `debug/debug-call.ts` and prints the validated
861  `CallToolResult` as seen by the MCP SDK client. Usage:
862 
863  ```sh
864  npm run test-images -- [placement] [--response_format url|path|b64_json]
865  # examples:
866  # npm run test-images -- structured --response_format b64_json
867  # npm run test-images -- structured --response_format path
868  # npm run test-images -- structured --response_format url
869  ```
870 
871- `npm run test-images:raw` – uses `debug/debug-call-raw.ts` and prints the raw
872  JSON-RPC `result` (the underlying `CallToolResult` without extra wrapping). Same
873  CLI flags as above.
874 
875Both scripts truncate large fields for readability:
876 
877- `image_url` → first 80 characters, then `...(N chars)`;
878- `b64_json` and `data` (when it is a base64 string) → first 25 characters, then `...(N chars)`.
879 
880---
881 
882## 🧩 Version policy
883 
884### Semantic Versioning (SemVer)
885 
886This package follows **SemVer**: `MAJOR.MINOR.PATCH` (x.y.z).
887 
888- `MAJOR` — breaking changes (tool names, input schemas, output shapes).
889- `MINOR` — new tools or backward-compatible additions (new optional params, new fields in responses).
890- `PATCH` — bug fixes and internal refactors with no intentional behavior change.
891 
892Since `1.0.0`, this project follows **standard SemVer rules**: breaking changes bump **MAJOR** (npm’s `^1.0.0` allows `1.x`, but not `2.0.0`).
893 
894### Dependency policy
895 
896This repository aims to stay **closely aligned with current stable releases**:
897 
898- **MCP SDK**: targeting the latest stable `@modelcontextprotocol/sdk` and schema.
899- **OpenAI SDK**: regularly updated to the latest stable `openai` package.
900- **Zod**: using the Zod 4.x line (currently `^4.1.3`). In this project we previously ran on Zod 3.x and, in combination with the MCP TypeScript SDK typings, hit heavy TypeScript errors when passing `.shape` into `inputSchema` — in particular TS2589 (*"type instantiation is excessively deep and possibly infinite"*) and TS2322 (*schema shape not assignable to `AnySchema | ZodRawShapeCompat`*). We track the upstream discussion in [modelcontextprotocol/typescript-sdk#494](https://github.com/modelcontextprotocol/typescript-sdk/issues/494) and the related Zod typing work in [colinhacks/zod#5222](https://github.com/colinhacks/zod/pull/5222), and keep the stack on a combination that passes **full strict** compilation reliably.
901- **Tooling stack** (Node.js, TypeScript, etc.): developed and tested against recent LTS / current releases, with a dedicated `tsconfig-strict.json` that enables all strict TypeScript checks (`strict`, `noUnusedLocals`, `noUnusedParameters`, `exactOptionalPropertyTypes`, `noUncheckedIndexedAccess`, `noPropertyAccessFromIndexSignature`, etc.).
902 
903You are welcome to pin or downgrade Node.js, TypeScript, the OpenAI SDK, Zod, or other pieces of the stack if your environment requires it, but please keep in mind:
904 
905- we primarily test and tune against the latest stack;
906- issues that only reproduce on older runtimes / SDK versions may be harder for us to investigate and support;
907- upstream compatibility is validated first of all against the latest MCP spec and OpenAI Images API.
908 
909This project is intentionally a bit **futuristic**: it tries to keep up with new capabilities as they appear in MCP and OpenAI tooling (in particular, robust multimodal/image support over MCP and in ChatGPT’s UI). A detailed real‑world bug report and analysis of MCP image rendering in ChatGPT is listed in the **References** section as a case study.
910 
911If you need a long-term-stable stack, pin exact versions in your own fork and validate them carefully in your environment.
912 
913---
914 
915## 🧩 Typed tool callbacks
916 
917All tool handlers use **strongly typed callback parameters** derived from Zod schemas via `z.input<typeof schema>`:
918 
919```typescript
920// Schema definition
921const openaiImagesGenerateBaseSchema = z.object({
922  prompt: z.string().max(32000),
923  background: z.enum(["transparent", "opaque", "auto"]).optional(),
924  // ... more fields
925});
926 
927// Type alias
928type OpenAIImagesGenerateArgs = z.input<typeof openaiImagesGenerateBaseSchema>;
929 
930// Strictly typed callback
931server.registerTool(
932  "openai-images-generate",
933  { inputSchema: openaiImagesGenerateBaseSchema.shape, ... },
934  async (args: OpenAIImagesGenerateArgs, _extra: unknown) => {
935    const validated = openaiImagesGenerateSchema.parse(args);
936    // ... handler logic
937  },
938);
939```
940 
941This pattern provides:
942 
943- **Static type safety** — IDE autocomplete and compile-time checks for all input fields.
944- **Runtime validation** — Zod `.parse()` ensures all inputs match the schema before processing.
945- **MCP SDK compatibility** — `inputSchema: schema.shape` provides the JSON Schema for tool registration.
946 
947All tools (`openai-images-*`, `openai-videos-*`, `fetch-images`, `fetch-videos`, `fetch-document`, `test-images`) follow this pattern.
948 
949---
950 
951## 🧩 Tool annotations
952 
953This MCP server exposes the following tools with annotation hints:
954 
955| Tool | `readOnlyHint` | `destructiveHint` | `idempotentHint` | `openWorldHint` |
956|------|----------------|-------------------|------------------|-----------------|
957| **openai-images-generate** | `true` | `false` | `false` | `true` |
958| **openai-images-edit** | `true` | `false` | `false` | `true` |
959| **openai-videos-create** | `true` | `false` | `false` | `true` |
960| **openai-videos-remix** | `true` | `false` | `false` | `true` |
961| **openai-videos-list** | `true` | `false` | `false` | `true` |
962| **openai-videos-retrieve** | `true` | `false` | `false` | `true` |
963| **openai-videos-delete** | `true` | `false` | `false` | `true` |
964| **openai-videos-retrieve-content** | `true` | `false` | `false` | `true` |
965| **fetch-images** | `true` | `false` | `false` | `false` |
966| **fetch-videos** | `true` | `false` | `false` | `false` |
967| **fetch-document** | `true` | `false` | `false` | `false` |
968| **test-images** | `true` | `false` | `false` | `false` |
969 
970These hints help MCP clients understand that these tools:
971- may invoke external APIs or read external resources (open world),
972- do not modify existing project files or user data; they only create new media files (images/videos/documents) in configured output directories,
973- may produce different outputs on each call, even with the same inputs.
974 
975Because `readOnlyHint` is set to `true` for most tools, MCP platforms (including chatgpt.com) can treat this server as logically read-only and usually will not show "this tool can modify your files" warnings.
976 
977---
978 
979## 📁 Project structure
980 
981```text
982media-gen-mcp/
983├── src/
984│   ├── index.ts              # MCP server entry point
985│   └── lib/
986│       ├── compression.ts    # Image compression (sharp)
987│       ├── env.ts            # Env parsing + allowlists (+ glob support)
988│       ├── helpers.ts        # URL/path validation, result building
989│       ├── logger.ts         # Structured logging + truncation helpers
990│       └── schemas.ts        # Zod schemas for all tools
991├── test/
992│   ├── compression.test.ts             # 12 tests
993│   ├── env.test.ts                     # 19 tests
994│   ├── fetch-images.integration.test.ts# 2 tests
995│   ├── fetch-videos.integration.test.ts# 2 tests
996│   ├── helpers.test.ts                 # 31 tests
997│   ├── logger.test.ts                  # 10 tests
998│   └── schemas.test.ts                 # 64 tests
999├── debug/                    # Local debug helpers (MCP client scripts)
1000├── plan/                     # Design notes / plans
1001├── dist/                     # Compiled output
1002├── tsconfig.json
1003├── vitest.config.ts
1004├── package.json
1005├── CHANGELOG.md
1006├── README.md
1007└── AGENTS.md
1008```
1009 
1010---
1011 
1012## 📝 License
1013 
1014MIT
1015 
1016---
1017 
1018## 🩺 Troubleshooting
1019 
1020- Make sure your `OPENAI_API_KEY` is valid and has image API access.
1021- You must have a [verified OpenAI organization](https://platform.openai.com/account/organization). After verifying, it can take 15–20 minutes for image API access to activate.
1022- File paths [optional param] must be absolute.
1023  - **Unix/macOS/Linux**: Starting with `/` (e.g., `/path/to/image.png`)
1024  - **Windows**: Drive letter followed by `:` (e.g., `C:/path/to/image.png` or `C:\path\to\image.png`)
1025 - For file output, ensure the target directory is writable.
1026 - If you see errors about file types, check your image file extensions and formats.
1027 
1028---
1029 
1030## 🙏 Inspiration
1031 
1032This server was originally inspired by
1033[SureScaleAI/openai-gpt-image-mcp](https://github.com/SureScaleAI/openai-gpt-image-mcp),
1034but is now a separate implementation focused on **closely tracking the official
1035specifications**:
1036 
1037- **OpenAI Images API alignment** – The arguments for `openai-images-generate`
1038  and `openai-images-edit` mirror
1039  [`images.create` / `gpt-image-1.5`](https://platform.openai.com/docs/api-reference/images/create):
1040  `prompt`, `n`, `size`, `quality`, `background`, `output_format`,
1041  `output_compression`, `user`, plus `response_format` (`url` / `b64_json`) with
1042  the same semantics as the OpenAI Images API.
1043- **MCP Tool Result alignment (image + resource_link)** – With
1044  `result_placement = "content"`, the server follows the MCP **5.2 Tool Result**
1045  section
1046  ([5.2.2 Image Content](https://modelcontextprotocol.io/specification/2025-11-25/server/tools#image-content),
1047  [5.2.4 Resource Links](https://modelcontextprotocol.io/specification/2025-11-25/server/tools#tool-result))
1048  and emits strongly-typed `content[]` items:
1049  - `{ "type": "image", "data": "<base64>", "mimeType": "image/png" }` for
1050    `response_format = "b64_json"`;
1051  - `{ "type": "resource_link", "uri": "file:///..." | "https://...", "name": "...", "mimeType": "image/..." }`
1052    for file/URL-based output.
1053- **Raw OpenAI-style API output** – With `result_placement = "api"`, the tool
1054  result itself **is** an OpenAI Images-style object:
1055  `{ created, data: [...], background, output_format, size, quality, usage? }`,
1056  where each `data[]` entry contains either `b64_json` (for
1057  `response_format = "b64_json"`) or `url` (for `response_format = "url"`). No
1058  MCP wrapper fields (`content`, `structuredContent`, `files`, `urls`) are
1059  added in this mode.
1060 
1061In short, this library:
1062 
1063- tracks the OpenAI Images API for **arguments and result shape** when
1064  `result_placement = "api"` with `response_format = "url" | "b64_json"`, and
1065- follows the MCP specification for **tool result content blocks** (`image`,
1066  `resource_link`, `text`) when `result_placement = "content"`.
1067 
1068### Recommended presets for common clients
1069 
1070- **Default mode / Claude Desktop / strict MCP clients**  
1071  For clients that strictly follow the MCP spec, the recommended (and natural)
1072  configuration is:
1073  - `result_placement = content`
1074  - `response_format = b64_json`
1075 
1076  In this mode the server returns:
1077  - `content[]` with `type: "image"` (base64 image data) and
1078    `type: "resource_link"` (file/URL links), matching MCP section 5.2 (Image
1079    Content and Resource Links). This output works well for **direct
1080    integration** with Claude Desktop and any client that fully implements the
1081    2025‑11‑25 spec.
1082 
1083- **chatgpt.com Developer Mode**  
1084  For running this server as an MCP backend behind ChatGPT Developer Mode, the
1085  most practical configuration is the one that most closely matches the OpenAI
1086  Images API:
1087  - `result_placement = api`
1088  - `response_format = url`
1089 
1090  In this mode the tool result matches the `images.create` / `gpt-image-1.5`
1091  format (including `data[].url`), which simplifies consumption from backends
1092  and libraries that expect the OpenAI schema.
1093 
1094  However, **even with this OpenAI-native shape, the chatgpt.com client does
1095  not currently render images**. This behavior is documented in detail in the
1096  following report:  
1097  <https://github.com/strato-space/report/issues/1>
1098---
1099 
1100## ⚠️ Limitations & Large File Handling
1101 
1102- **Configurable payload safeguard:** By default this server uses a ~50MB budget (52,428,800 bytes) for inline `content` to stay within typical MCP client limits. You can override this threshold by setting the `MCP_MAX_CONTENT_BYTES` environment variable to a higher (or lower) value.
1103- **Auto-Switch to File Output:** If the total image base64 size exceeds the configured threshold, the tool automatically saves images to disk and returns file path(s) via `resource_link` instead of inline base64. This helps avoid client-side "payload too large" errors while still delivering full-resolution images.
1104- **Default File Location:** If you do not specify a `file` path, outputs are saved under `MEDIA_GEN_DIRS[0]` (default: `/tmp/media-gen-mcp`) using names like `output_<time_t>_media-gen__<tool>_<id>.<ext>`.
1105- **Environment Variables:**
1106  - `MEDIA_GEN_DIRS`: Set this to control where outputs are saved. Example: `export MEDIA_GEN_DIRS=/your/desired/dir`. This directory may coincide with your public static directory if you serve files directly from it.
1107  - `MEDIA_GEN_MCP_URL_PREFIXES`: Optional comma-separated HTTPS prefixes for public URLs, matched positionally to `MEDIA_GEN_DIRS` entries. When set, the server builds public URLs as `<prefix>/<relative_path_inside_root>` and returns them alongside file paths (for example via `resource_link` URIs and `structuredContent.data[].url` when `response_format: "url"`). Example: `export MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media,https://media-gen.example.com/samples`
1108  - **Best Practice:** For large or production images, always use file output and ensure your client is configured to handle file paths. Configure `MEDIA_GEN_DIRS` and (optionally) `MEDIA_GEN_MCP_URL_PREFIXES` to serve images via a public web server (e.g., nginx).
1109 
1110---
1111 
1112## 🌐 Serving generated files over HTTPS
1113 
1114If you want ChatGPT (or any MCP client) to mention publicly accessible URLs alongside file paths:
1115 
11161. Expose your image directory via HTTPS. For example, on nginx:
1117 
1118   ```nginx
1119   server {
1120       # listen 443 ssl http2;
1121       # server_name <server_name>;
1122 
1123       # ssl_certificate     <path>;
1124       # ssl_certificate_key <path>;
1125 
1126       location /media/ {
1127           alias /home/username/media-gen-mcp/media/;
1128           autoindex off;
1129           expires 7d;
1130           add_header Cache-Control "public, immutable";
1131       }
1132   }
1133   ```
1134 
11352. Ensure the first entry in `MEDIA_GEN_DIRS` points to the same directory (e.g. `MEDIA_GEN_DIRS=/home/username/media-gen-mcp/media/` or `MEDIA_GEN_DIRS=media/` when running from the project root).
11363. Set `MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media` so the server returns matching HTTPS URLs in top-level `urls`, `resource_link` URIs, and `image_url` fields (for `response_format: "url"`).
1137 
1138Both `openai-images-generate` and `openai-images-edit` now attach `files` + `urls` for **base64** and **file** response modes, allowing clients to reference either the local filesystem path or the public HTTPS link. This is particularly useful while ChatGPT cannot yet render MCP image blocks inline.
1139 
1140---
1141 
1142## 📚 References
1143 
1144- **Model Context Protocol**
1145  - [MCP Specification](https://modelcontextprotocol.io/docs/getting-started/intro)
1146  - [MCP Schema (2025-11-25)](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/2025-11-25/schema.json)
1147 
1148- **OpenAI Images**
1149  - [Images API overview](https://platform.openai.com/docs/api-reference/images)
1150  - [Images generate (gpt-image-1.5)](https://platform.openai.com/docs/api-reference/images/create)
1151  - [Images edit (`createEdit`)](https://platform.openai.com/docs/api-reference/images/createEdit)
1152  - [Tools guide: image generation & revised_prompt](https://platform.openai.com/docs/guides/tools-image-generation)
1153 
1154- **OpenAI Videos**
1155  - [Videos API overview](https://platform.openai.com/docs/api-reference/videos)
1156 
1157- **Case studies**
1158  - [MCP image rendering in ChatGPT (GitHub issue)](https://github.com/strato-space/report/issues/1)
1159    - **Symptoms:** ChatGPT often ignored or mishandled MCP `image` content blocks: empty tool results, raw base64 treated as text (huge token usage), or generic "I can't see the image" responses, while other MCP clients (Cursor, Claude) rendered the same images correctly.
1160    - **Root cause:** not a problem with the MCP spec itself, but with ChatGPT's handling/serialization of MCP `CallToolResult` image content blocks and media objects (especially around UI rendering and nested containers).
1161    - **Status & workarounds:** OpenAI has begun rolling out fixes for MCP image support in Codex/ChatGPT, but behavior is still inconsistent; this server uses file/resource_link + URL patterns and spec‑conformant `image` blocks so that tools remain usable across current and future MCP clients.
1162 
1163---
1164 
1165## 🙏 Credits
1166 
1167- Built with [@modelcontextprotocol/sdk](https://www.npmjs.com/package/@modelcontextprotocol/sdk)
1168- Uses [openai](https://www.npmjs.com/package/openai) Node.js SDK 
1169- Refactoring and MCP spec alignment assisted by [Windsurf](https://windsurf.com) and [GPT-5 High Reasoning](https://openai.com).
1170
Full transparency — inspect the skill content before installing.
New to skill.md files?
See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.
Read the guide →