Media Gen MCP is a strict TypeScript Model Context Protocol (MCP) server for OpenAI Images (gpt-image-1.5, gpt-image-1), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart resourcelink vs inline image outputs and optional sharp processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Wo
Add this skill
npx mdskills install strato-space/media-gen-mcpComprehensive MCP server with full OpenAI Images/Sora and Google Veo support, strict TypeScript, and extensive testing coverage
1# media-gen-mcp23<p align="center">4 <a href="https://www.npmjs.com/package/media-gen-mcp"><img src="https://img.shields.io/npm/v/media-gen-mcp?label=media-gen-mcp&color=brightgreen" alt="media-gen-mcp"></a>5 <a href="https://www.npmjs.com/package/@modelcontextprotocol/sdk"><img src="https://img.shields.io/npm/v/@modelcontextprotocol/sdk?label=MCP%20SDK&color=blue" alt="MCP SDK"></a>6 <a href="https://www.npmjs.com/package/openai"><img src="https://img.shields.io/npm/v/openai?label=OpenAI%20SDK&color=blueviolet" alt="OpenAI SDK"></a>7 <a href="https://github.com/punkpeye/mcp-proxy"><img src="https://img.shields.io/github/stars/punkpeye/mcp-proxy?label=mcp-proxy&style=social" alt="mcp-proxy"></a>8 <a href="https://github.com/yjacquin/fast-mcp"><img src="https://img.shields.io/github/stars/yjacquin/fast-mcp?label=fast-mcp&style=social" alt="fast-mcp"></a>9 <a href="https://github.com/strato-space/media-gen-mcp/blob/main/LICENSE"><img src="https://img.shields.io/github/license/strato-space/media-gen-mcp?color=brightgreen" alt="License"></a>10 <a href="https://github.com/strato-space/media-gen-mcp/stargazers"><img src="https://img.shields.io/github/stars/strato-space/media-gen-mcp?style=social" alt="GitHub stars"></a>11 <a href="https://github.com/strato-space/media-gen-mcp/actions"><img src="https://img.shields.io/github/actions/workflow/status/strato-space/media-gen-mcp/main.yml?label=build&logo=github" alt="Build Status"></a>12</p>1314---1516**Media Gen MCP** is a **strict TypeScript** Model Context Protocol (MCP) server for OpenAI Images (`gpt-image-1.5`, `gpt-image-1`), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart `resource_link` vs inline `image` outputs and optional `sharp` processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Works with fast-agent, Claude Desktop, ChatGPT, Cursor, VS Code, Windsurf, and any MCP-compatible client.1718**Design principle:** spec-first, type-safe image tooling – strict OpenAI Images API + MCP compliance with fully static TypeScript types and flexible result placements/response formats for different clients.1920- **Generate images** from text prompts using OpenAI's `gpt-image-1.5` model (with `gpt-image-1` compatibility and DALL·E support planned in future versions).21- **Edit images** (inpainting, outpainting, compositing) from 1 up to 16 images at once, with advanced prompt control.22- **Generate videos** via OpenAI Videos (`sora-2`, `sora-2-pro`) with job create/remix/list/retrieve/delete and asset downloads.23- **Generate videos** via Google GenAI (Veo) with operation polling and file-first downloads.24- **Fetch & compress images** from HTTP(S) URLs or local file paths with smart size/quality optimization.25- **Fetch documents** from HTTP(S) URLs or local file paths and return `resource_link`/`resource` outputs.26- **Debug MCP output shapes** with a `test-images` tool that mirrors production result placement (`content`, `structuredContent`, `toplevel`).27- **Integrates with**: [fast-agent](https://github.com/strato-space/fast-agent), [Windsurf](https://windsurf.com), [Claude Desktop](https://www.anthropic.com/claude/desktop), [Cursor](https://cursor.com), [VS Code](https://code.visualstudio.com/), and any MCP-compatible client.2829---3031## ✨ Features3233- **Strict MCP spec support**34 Tool outputs are first-class [`CallToolResult`](https://github.com/modelcontextprotocol/spec/blob/main/schema/2025-11-25/schema.json) objects from the latest MCP schema, including:35 `content` items (`text`, `image`, `resource_link`, `resource`), optional `structuredContent`, optional top-level `files`, and the `isError` flag for failures.3637- **Full gpt-image-1.5 and sora-2/sora-2-pro parameters coverage (generate & edit)**38 - [`openai-images-generate`](#openai-images-generate) mirrors the OpenAI Images [`create`](https://platform.openai.com/docs/api-reference/images/create) API for `gpt-image-1.5` (and `gpt-image-1`) (background, moderation, size, quality, output_format, output_compression, `n`, `user`, etc.).39 - [`openai-images-edit`](#openai-images-edit) mirrors the OpenAI Images [`createEdit`](https://platform.openai.com/docs/api-reference/images/createEdit) API for `gpt-image-1.5` (and `gpt-image-1`) (image, mask, `n`, quality, size, `user`).4041- **OpenAI Videos (Sora) job tooling (create / remix / list / retrieve / delete / content)**42 - [`openai-videos-create`](#openai-videos-create) mirrors [`videos/create`](https://platform.openai.com/docs/api-reference/videos/create) and can optionally wait for completion.43 - [`openai-videos-remix`](#openai-videos-remix) mirrors [`videos/remix`](https://platform.openai.com/docs/api-reference/videos/remix).44 - [`openai-videos-list`](#openai-videos-list) mirrors [`videos/list`](https://platform.openai.com/docs/api-reference/videos/list).45 - [`openai-videos-retrieve`](#openai-videos-retrieve) mirrors [`videos/retrieve`](https://platform.openai.com/docs/api-reference/videos/retrieve).46 - [`openai-videos-delete`](#openai-videos-delete) mirrors [`videos/delete`](https://platform.openai.com/docs/api-reference/videos/delete).47 - [`openai-videos-retrieve-content`](#openai-videos-retrieve-content) mirrors [`videos/content`](https://platform.openai.com/docs/api-reference/videos/content) and downloads `video` / `thumbnail` / `spritesheet` assets to disk, returning MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).4849- **Google GenAI (Veo) operations + downloads (generate / retrieve operation / retrieve content)**50 - [`google-videos-generate`](#google-videos-generate) starts a long-running operation (`ai.models.generateVideos`) and can optionally wait for completion and download `.mp4` outputs. [Veo model reference](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/veo-video-generation)51 - [`google-videos-retrieve-operation`](#google-videos-retrieve-operation) polls an existing operation.52 - [`google-videos-retrieve-content`](#google-videos-retrieve-content) downloads an `.mp4` from a completed operation, returning MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).5354- **Fetch and process images from URLs or files**55 [`fetch-images`](#fetch-images) tool loads images from HTTP(S) URLs or local file paths with optional, user-controlled compression (disabled by default). Supports parallel processing of up to 20 images.5657- **Fetch videos from URLs or files**58 [`fetch-videos`](#fetch-videos) tool lists local videos or downloads remote video URLs to disk and returns MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).5960- **Fetch documents from URLs or files**61 [`fetch-document`](#fetch-document) tool downloads remote files or reuses local paths and returns MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).6263- **Mix and edit up to 16 images**64 [`openai-images-edit`](#openai-images-edit) accepts `image` as a single string or an array of 1–16 file paths/base64 strings, matching the OpenAI spec for GPT Image models (`gpt-image-1.5`, `gpt-image-1`) image edits.6566- **Smart image compression**67 Built-in compression using [sharp](https://sharp.pixelplumbing.com/) — iteratively reduces quality and dimensions to fit MCP payload limits while maintaining visual quality.6869- **Resource-aware file output with `resource_link`**70 - Automatic switch from inline base64 to `file` when the total response size exceeds a safe threshold.71 - Outputs are written to disk using `output_<time_t>_media-gen__<tool>_<id>.<ext>` filenames (images/documents use a generated UUID; videos use the OpenAI `video_id`) and exposed to MCP clients via `content[]` depending on `tool_result` (`resource_link`/`image` for images, `resource_link`/`resource` for video/document downloads).7273- **Built-in test-images tool for MCP client debugging**74 [`test-images`](#test-images) reads sample images from a configured directory and returns them using the same result-building logic as production tools. Use `tool_result` and `response_format` parameters to test how different MCP clients handle `content[]` and `structuredContent`.7576- **Structured MCP error handling**77 All tool errors (validation, OpenAI API failures, I/O) are returned as MCP errors with78 `isError: true` and `content: [{ type: "text", text: <error message> }]`, making failures easy to parse and surface in MCP clients.7980---8182## 🚀 Installation8384```sh85git clone https://github.com/strato-space/media-gen-mcp.git86cd media-gen-mcp8788npm install89npm run build90```9192Build modes:9394- `npm run build` – strict TypeScript build with **all strict flags enabled**, including `skipLibCheck: false`. Incremental builds via `.tsbuildinfo` (~2-3s on warm cache).95- `npm run esbuild` – fast bundling via esbuild (no type checking, useful for rapid iteration).9697### Development mode (no build required)9899For development or when TypeScript compilation fails due to memory constraints:100101```sh102npm run dev # Uses tsx to run TypeScript directly103```104105### Quality checks106107```sh108npm run lint # ESLint with typescript-eslint109npm run typecheck # Strict tsc --noEmit110npm run test # Unit tests (vitest)111npm run test:watch # Watch mode for TDD112npm run ci # lint + typecheck + test113```114115### Unit tests116117The project uses [vitest](https://vitest.dev/) for unit testing. Tests are located in `test/`.118119**Covered modules:**120121| Module | Tests | Description |122|--------|-------|-------------|123| `compression` | 12 | Image format detection, buffer processing, file I/O |124| `helpers` | 31 | URL/path validation, output resolution, result placement, resource links |125| `env` | 19 | Configuration parsing, env validation, defaults |126| `logger` | 10 | Structured logging + truncation safety |127| `pricing` | 5 | Sora pricing estimate helpers |128| `schemas` | 69 | Zod schema validation for all tools, type inference |129| `fetch-images` (integration) | 3 | End-to-end MCP tool call behavior |130| `fetch-videos` (integration) | 3 | End-to-end MCP tool call behavior |131132**Test categories:**133134- **compression** — `isCompressionAvailable`, `detectImageFormat`, `processBufferWithCompression`, `readAndProcessImage`135- **helpers** — `isHttpUrl`, `isAbsolutePath`, `isBase64Image`, `ensureDirectoryWritable`, `resolveOutputPath`, `getResultPlacement`, `buildResourceLinks`136- **env** — config loading and validation for `MEDIA_GEN_*` / `MEDIA_GEN_MCP_*` settings137- **logger** — truncation and error formatting behavior138- **schemas** — validation for `openai-images-*`, `openai-videos-*`, `fetch-images`, `fetch-videos`, `test-images` inputs, boundary testing (prompt length, image count limits, path validation)139140```sh141npm run test142# ✓ test/compression.test.ts (12 tests)143# ✓ test/helpers.test.ts (31 tests)144# ✓ test/env.test.ts (19 tests)145# ✓ test/logger.test.ts (10 tests)146# ✓ test/pricing.test.ts (5 tests)147# ✓ test/schemas.test.ts (69 tests)148# ✓ test/fetch-images.integration.test.ts (3 tests)149# ✓ test/fetch-videos.integration.test.ts (3 tests)150# Tests: 152 passed151```152153### Run directly via npx (no local clone)154155You can also run the server straight from a remote repo using `npx`:156157```sh158npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env159```160161The `--env-file` argument tells the server which env file to load (e.g. when you keep secrets outside the cloned directory). The file should contain `OPENAI_API_KEY`, optional Azure variables, and any `MEDIA_GEN_MCP_*` settings.162163### `secrets.yaml` (optional)164165You can keep API keys (and optional Google Vertex AI settings) in a `secrets.yaml` file (compatible with the fast-agent secrets template):166167```yaml168openai:169 api_key: <your-api-key-here>170anthropic:171 api_key: <your-api-key-here>172google:173 api_key: <your-api-key-here>174 vertex_ai:175 enabled: true176 project_id: your-gcp-project-id177 location: europe-west4178```179180`media-gen-mcp` loads `secrets.yaml` from the current working directory (or from `--secrets-file /path/to/secrets.yaml`) and applies it to env vars; values in `secrets.yaml` override env, and `<your-api-key-here>` placeholders are ignored.181182---183184## ⚡ Quick start (fast-agent & Windsurf)185186### fast-agent187188In fast-agent, MCP servers are configured in `fastagent.config.yaml` under the `mcp.servers` section (see the [fast-agent docs](https://github.com/strato-space/fast-agent)).189190To add `media-gen-mcp` from GitHub via `npx` as an MCP server:191192```yaml193# fastagent.config.yaml194195mcp:196 servers:197 # your existing servers (e.g. fetch, filesystem, huggingface, ...)198 media-gen-mcp:199 command: "npx"200 args: ["-y", "github:strato-space/media-gen-mcp", "--env-file", "/path/to/media-gen.env"]201```202203Put `OPENAI_API_KEY` and other settings into `media-gen.env` (see `.env.sample` in this repo).204205### Windsurf206207Add an MCP server that runs `media-gen-mcp` from GitHub via `npx` using the JSON format below (similar to Claude Desktop / VS Code):208209```json210{211 "mcpServers": {212 "media-gen-mcp": {213 "command": "npx",214 "args": ["-y", "github:strato-space/media-gen-mcp", "--env-file", "/path/to/media-gen.env"]215 }216 }217}218```219220---221222## 🔑 Configuration223224Add to your MCP client config (fast-agent, Windsurf, Claude Desktop, Cursor, VS Code):225226```json227{228 "mcpServers": {229 "media-gen-mcp": {230 "command": "npx",231 "args": ["-y", "github:strato-space/media-gen-mcp"],232 "env": { "OPENAI_API_KEY": "sk-..." }233 }234 }235}236```237238Also supports Azure deployments:239240```json241{242 "mcpServers": {243 "media-gen-mcp": {244 "command": "npx",245 "args": ["-y", "github:strato-space/media-gen-mcp"],246 "env": {247 // "AZURE_OPENAI_API_KEY": "sk-...",248 // "AZURE_OPENAI_ENDPOINT": "my.endpoint.com",249 "OPENAI_API_VERSION": "2024-12-01-preview"250 }251 }252 }253}254```255256Environment variables:257258- Set `OPENAI_API_KEY` (and optionally `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `OPENAI_API_VERSION`) in the environment of the process that runs `node dist/index.js` (shell, systemd unit, Docker env, etc.).259- The server will **optionally** load a local `.env` file from its working directory if present (it does not override already-set environment variables).260- You can also pass `--env-file /path/to/env` when starting the server (including via `npx`); this file is loaded via `dotenv` before tools run, again without overriding already-set variables.261262### Logging and base64 truncation263264To avoid flooding logs with huge image payloads, the built-in logger applies a265log-only sanitizer to structured `data` passed to `log.debug/info/warn/error`:266267- Truncates configured string fields (e.g. `b64_json`, `base64`, string268 `data`, `image_url`) to a short preview controlled by269 `LOG_TRUNCATE_DATA_MAX` (default: 64 characters). The list of keys defaults270 to `LOG_SANITIZE_KEYS` inside `src/lib/logger.ts` and can be overridden via271 `MEDIA_GEN_MCP_LOG_SANITIZE_KEYS` (comma-separated list of field names).272- Sanitization is applied **only** to log serialization; tool results returned273 to MCP clients are never modified.274275Control via environment:276277- `MEDIA_GEN_MCP_LOG_SANITIZE_IMAGES` (default: `true`)278 - `1`, `true`, `yes`, `on` – enable truncation (default behaviour).279 - `0`, `false`, `no`, `off` – disable truncation and log full payloads.280281Field list and limits are configured in `src/lib/logger.ts` via282`LOG_SANITIZE_KEYS` and `LOG_TRUNCATE_DATA_MAX`.283284### Security and local file access285286- **Allowed directories**: All tools are restricted to paths matching `MEDIA_GEN_DIRS`. If unset, defaults to `/tmp/media-gen-mcp` (or `%TEMP%/media-gen-mcp` on Windows).287- **Test samples**: `MEDIA_GEN_MCP_TEST_SAMPLE_DIR` adds a directory to the allowlist and enables the `test-images` tool.288- **Local reads**: `fetch-images` and `fetch-document` accept file paths (absolute or relative). Relative paths are resolved against the first `MEDIA_GEN_DIRS` entry and must still match an allowed pattern.289- **Remote reads**: HTTP(S) fetches are filtered by `MEDIA_GEN_URLS` patterns. Empty = allow all.290- **Writes**: `openai-images-generate`, `openai-images-edit`, `fetch-images`, `fetch-videos`, and `fetch-document` write under the first entry of `MEDIA_GEN_DIRS`. `test-images` is read-only and does not create new files.291292#### Glob patterns293294Both `MEDIA_GEN_DIRS` and `MEDIA_GEN_URLS` support glob wildcards:295296| Pattern | Matches | Example |297|---------|---------|---------|298| `*` | Any single segment (no `/`) | `/home/*/media/` matches `/home/user1/media/` |299| `**` | Any number of segments | `/data/**/images/` matches `/data/a/b/images/` |300301URL examples:302```shell303MEDIA_GEN_URLS=https://*.cdn.example.com/,https://storage.example.com/**/assets/304```305306Path examples:307```shell308MEDIA_GEN_DIRS=/home/*/media-gen/output/,/data/**/images/309```310311⚠️ **Warning**: Trailing wildcards without a delimiter (e.g., `/home/user/*` or `https://cdn.com/**`) expose entire subtrees and trigger a console warning at startup.312313#### Recommended mitigations3143151. Run under a dedicated OS user with access only to allowed directories.3162. Keep allowlists minimal. Avoid `*` in home directories or system paths.3173. Use explicit `MEDIA_GEN_URLS` prefixes for remote fetches.3184. Monitor allowed directories via OS ACLs or backups.319320### Tool Result Parameters: `tool_result` and `response_format`321322Image tools (`openai-images-*`, `fetch-images`, `test-images`) support two parameters that control the shape of the MCP tool result:323324| Parameter | Values | Default | Description |325|-----------|--------|---------|-------------|326| `tool_result` | `resource_link`, `image` | `resource_link` | Controls `content[]` shape |327| `response_format` | `url`, `path`, `b64_json` | `url` | Controls `structuredContent` shape (OpenAI ImagesResponse format) |328329Video/document download tools (`openai-videos-create` / `openai-videos-remix` when downloading, `openai-videos-retrieve-content`, `google-videos-generate` when downloading, `google-videos-retrieve-content`, `fetch-videos`, `fetch-document`) support:330331| Parameter | Values | Default | Description |332|-----------|--------|---------|-------------|333| `tool_result` | `resource_link`, `resource` | `resource_link` | Controls `content[]` shape |334335Google video tools (`google-videos-*`) also support:336337| Parameter | Values | Default | Description |338|-----------|--------|---------|-------------|339| `response_format` | `url`, `b64_json` | `url` | Controls `structuredContent.response.generatedVideos[].video` shape (`uri` vs `videoBytes`) |340341#### `tool_result` — controls `content[]`342343- **Images** (`openai-images-*`, `fetch-images`, `test-images`)344 - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs345 - **`image`**: Emits base64 `ImageContent` blocks346- **Videos** (tools that download video data)347 - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs348 - **`resource`**: Emits `EmbeddedResource` blocks with base64 `resource.blob`349- **Documents** (`fetch-document`)350 - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs351 - **`resource`**: Emits `EmbeddedResource` blocks with base64 `resource.blob`352353#### `response_format` — controls `structuredContent`354355For OpenAI images, `structuredContent` always contains an OpenAI ImagesResponse-style object:356357```jsonc358{359 "created": 1234567890,360 "data": [361 { "url": "https://..." } // or { "path": "/abs/path.png" } / { "b64_json": "..." } depending on response_format362 ]363}364```365366- **`url`** (default): `data[].url` contains file URLs367- **`path`**: `data[].path` contains local filesystem paths368- **`b64_json`**: `data[].b64_json` contains base64-encoded image data369370For Google videos, `response_format` controls whether `structuredContent.response.generatedVideos[].video` prefers:371372- **`url`** (default): `video.uri` (and strips `video.videoBytes`)373- **`b64_json`**: `video.videoBytes` (and strips `video.uri`)374375#### Backward Compatibility (MCP 5.2.6)376377Per MCP spec 5.2.6, a `TextContent` block with serialized JSON (always using URLs in `data[]`) is also included in `content[]` for backward compatibility with clients that don't support `structuredContent`.378379Example tool result structure:380381```jsonc382{383 "content": [384 // ResourceLink or ImageContent based on tool_result385 { "type": "resource_link", "uri": "https://...", "name": "image.png", "mimeType": "image/png" },386 // Serialized JSON for backward compatibility (MCP 5.2.6)387 { "type": "text", "text": "{ \"created\": 1234567890, \"data\": [{ \"url\": \"https://...\" }] }" }388 ],389 "structuredContent": {390 "created": 1234567890,391 "data": [{ "url": "https://..." }]392 }393}394```395396**ChatGPT MCP client behavior (chatgpt.com, as of 2025-12-01):**397398- ChatGPT currently ignores `content[]` image data in favor of `structuredContent`.399- For ChatGPT, use `response_format: "url"` and configure the first `MEDIA_GEN_MCP_URL_PREFIXES` entry as a public HTTPS prefix (for example `MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media`).400401For Anthropic clients (Claude Desktop, etc.), the default configuration works well.402403### Network access via mcp-proxy (SSE)404405For networked SSE access you can front `media-gen-mcp` with [`mcp-proxy`](https://github.com/modelcontextprotocol/servers/tree/main/src/proxy) or its equivalent. This setup has been tested with the TypeScript SSE proxy implementation [`punkpeye/mcp-proxy`](https://github.com/punkpeye/mcp-proxy).406407For example, a one-line command looks like:408409```sh410mcp-proxy --host=0.0.0.0 --port=99 --server=sse --sseEndpoint=/ --shell 'npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env'411```412413In production you would typically wire this up via a systemd template unit that loads `PORT`/`SHELL_CMD` from an `EnvironmentFile=` (see `server/mcp/mcp@.service` style setups).414415---416417## 🛠 Tool signatures418419### openai-images-generate420421Arguments (input schema):422423- `prompt` (string, required)424 - Text prompt describing the desired image.425 - Max length: 32,000 characters.426- `background` ("transparent" | "opaque" | "auto", optional)427 - Background handling mode.428 - If `background` is `"transparent"`, then `output_format` must be `"png"` or `"webp"`.429- `model` ("gpt-image-1.5" | "gpt-image-1", optional, default: "gpt-image-1.5")430- `moderation` ("auto" | "low", optional)431 - Content moderation behavior, passed through to the Images API.432- `n` (integer, optional)433 - Number of images to generate.434 - Min: 1, Max: 10.435- `output_compression` (integer, optional)436 - Compression level (0–100).437 - Only applied when `output_format` is `"jpeg"` or `"webp"`.438- `output_format` ("png" | "jpeg" | "webp", optional)439 - Output image format.440 - If omitted, the server treats output as PNG semantics.441- `quality` ("auto" | "high" | "medium" | "low", default: "high")442- `size` ("1024x1024" | "1536x1024" | "1024x1536" | "auto", default: "1024x1536")443- `user` (string, optional)444 - User identifier forwarded to OpenAI for monitoring.445- `response_format` ("url" | "path" | "b64_json", default: "url")446 - Response format (aligned with OpenAI Images API):447 - `"url"`: file/URL-based output (resource_link items, `image_url` fields, `data[].url` in `api` placement).448 - `"path"`: local filesystem paths in `data[].path` (for local skill workflows).449 - `"b64_json"`: inline base64 image data (image content, `data[].b64_json` in `api` placement).450 - `tool_result` ("resource_link" | "image", default: "resource_link")451 - Controls `content[]` shape:452 - `"resource_link"` emits ResourceLink items (file/URL-based)453 - `"image"` emits base64 ImageContent blocks454455Behavior notes:456457- The server uses OpenAI `gpt-image-1.5` by default (set `model: "gpt-image-1"` for legacy behavior).458- If the total size of all base64 images would exceed the configured payload459 threshold (default ~50MB via `MCP_MAX_CONTENT_BYTES`), the server460 automatically switches the **effective output mode** to file/URL-based and saves461 images to the first entry of `MEDIA_GEN_DIRS` (default: `/tmp/media-gen-mcp`).462- Even when you explicitly request `response_format: "b64_json"`, the server still writes463 the files to disk (for static hosting, caching, or later reuse). Exposure of464 file paths / URLs in the tool result then depends on `MEDIA_GEN_MCP_RESULT_PLACEMENT`465 and per-call `result_placement` (see section below).466467Output (MCP CallToolResult, when placement includes `"content"`):468469- When the effective `output` mode is `"base64"`:470 - `content` is an array that may contain:471 - image items:472 - `{ type: "image", data: <base64 string>, mimeType: <"image/png" | "image/jpeg" | "image/webp"> }`473 - optional text items with revised prompts returned by the Images API (for models that support it, e.g. DALL·E 3):474 - `{ type: "text", text: <revised_prompt string> }`475- When the effective `output` mode is `"file"`:476 - `content` contains one `resource_link` item per file, plus the same optional `text` items with revised prompts:477 - `{ type: "resource_link", uri: "file:///absolute-path-1.png", name: "absolute-path-1.png", mimeType: <image mime> }`478 - For `gpt-image-1.5` and `gpt-image-1`, an additional `text` line is included with a pricing estimate (based on `structuredContent.usage`), and `structuredContent.pricing` contains the full pricing breakdown.479480When `result_placement` includes `"api"`, `openai-images-generate` instead returns an **OpenAI Images API-like object** without MCP wrappers:481482```jsonc483{484 "created": 1764599500,485 "data": [486 { "b64_json": "..." } // or { "url": "https://.../media/file.png" } when output: "file"487 ],488 "background": "opaque",489 "output_format": "png",490 "size": "1024x1024",491 "quality": "high"492}493```494495### openai-images-edit496497Arguments (input schema):498499- `image` (string or string[], required)500 - Either a single absolute path to an image file (`.png`, `.jpg`, `.jpeg`, `.webp`),501 a base64-encoded image string (optionally as a `data:image/...;base64,...` URL),502 **or an HTTP(S) URL** pointing to a publicly accessible image,503 **or** an array of 1–16 such strings (for multi-image editing).504 - When an HTTP(S) URL is provided, the server fetches the image and converts it to base64 before sending to OpenAI.505- `prompt` (string, required)506 - Text description of the desired edit.507 - Max length: 32,000 characters.508- `mask` (string, optional)509 - Absolute path, base64 string, or HTTP(S) URL for a mask image (PNG < 4MB, same dimensions510 as the source image). Transparent areas mark regions to edit.511- `model` ("gpt-image-1.5" | "gpt-image-1", optional, default: "gpt-image-1.5")512- `n` (integer, optional)513 - Number of images to generate.514 - Min: 1, Max: 10.515- `quality` ("auto" | "high" | "medium" | "low", default: "high")516- `size` ("1024x1024" | "1536x1024" | "1024x1536" | "auto", default: "1024x1536")517- `user` (string, optional)518 - User identifier forwarded to OpenAI for monitoring.519- `response_format` ("url" | "path" | "b64_json", default: "url")520 - Response format (aligned with OpenAI Images API):521 - `"url"`: file/URL-based output (resource_link items, `image_url` fields, `data[].url` in `api` placement).522 - `"path"`: local filesystem paths in `data[].path` (for local skill workflows).523 - `"b64_json"`: inline base64 image data (image content, `data[].b64_json` in `api` placement).524- `tool_result` ("resource_link" | "image", default: "resource_link")525 - Controls `content[]` shape:526 - `"resource_link"` emits ResourceLink items (file/URL-based)527 - `"image"` emits base64 ImageContent blocks528529Behavior notes:530531- The server accepts `image` and `mask` as absolute paths, base64/data URLs, or HTTP(S) URLs.532- When an HTTP(S) URL is provided, the server fetches the image and converts it to a base64 data URL before calling OpenAI.533- For edits, the server always returns PNG semantics (mime type `image/png`)534 when emitting images.535536Output (MCP CallToolResult):537538- When the effective `output` mode is `"base64"`:539 - `content` is an array that may contain:540 - image items:541 - `{ type: "image", data: <base64 string>, mimeType: "image/png" }`542 - optional text items with revised prompts (when the underlying model returns them):543 - `{ type: "text", text: <revised_prompt string> }`544- When the effective `output` mode is `"file"`:545 - `content` contains one `resource_link` item per file, plus the same optional `text` items with revised prompts:546 - `{ type: "resource_link", uri: "file:///absolute-path-1.png", name: "absolute-path-1.png", mimeType: "image/png" }`547 - For `gpt-image-1.5` and `gpt-image-1`, an additional `text` line is included with a pricing estimate (based on `structuredContent.usage`), and `structuredContent.pricing` contains the full pricing breakdown.548549When `result_placement` includes `"api"`, `openai-images-edit` follows the **same raw API format** as `openai-images-generate` (top-level `created`, `data[]`, `background`, `output_format`, `size`, `quality` with `b64_json` for base64 output or `url` for file output).550551Error handling (both tools):552553- On errors inside the tool handler (validation, OpenAI API failures, I/O, etc.), the server returns a CallToolResult marked as an error:554 - `isError: true`555 - `content: [{ type: "text", text: <error message string> }]`556- The error message text is taken directly from the underlying exception message, without additional commentary from the server, while full details are logged to the server console.557558### openai-videos-create559560Create a video generation job using the OpenAI Videos API (`videos.create`).561562Arguments (input schema):563564- `prompt` (string, required) — text prompt describing the video (max 32K chars).565- `input_reference` (string, optional) — optional image reference (HTTP(S) URL, base64/data URL, or file path).566- `input_reference_fit` ("match" | "cover" | "contain" | "stretch", default: "contain")567 - How to fit `input_reference` to the requested video `size`:568 - `match`: require exact dimensions (fails fast on mismatch)569 - `cover`: resize + center-crop to fill570 - `contain`: resize + pad/letterbox to fit (default)571 - `stretch`: resize with distortion572- `input_reference_background` ("blur" | "black" | "white" | "#RRGGBB" | "#RRGGBBAA", default: "blur")573 - Padding background used when `input_reference_fit="contain"`.574- `model` ("sora-2" | "sora-2-pro", default: "sora-2-pro")575- `seconds` ("4" | "8" | "12", optional)576- `size` ("720x1280" | "1280x720" | "1024x1792" | "1792x1024", optional)577 - `1024x1792` and `1792x1024` require `sora-2-pro`.578 - If `input_reference` is omitted and `size` is omitted, the API default is used.579- `wait_for_completion` (boolean, default: true)580 - When true, the server polls `openai-videos-retrieve` until `completed` or `failed` (or timeout), then downloads assets.581- `timeout_ms` (integer, default: 900000)582- `poll_interval_ms` (integer, default: 2000)583- `download_variants` (string[], default: ["video"])584 - Allowed values: `"video" | "thumbnail" | "spritesheet"`.585- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)586 - Controls `content[]` shape for downloaded assets:587 - `"resource_link"` emits ResourceLink items (file/URL-based)588 - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob`589590Output (MCP CallToolResult):591592- `structuredContent`: OpenAI `Video` object (job metadata; final state when `wait_for_completion=true`).593- `content`: includes `resource_link` (default) or embedded `resource` blocks for downloaded assets (when requested) and text blocks with JSON.594 - Includes a summary JSON block: `{ "video_id": "...", "pricing": { "currency": "USD", "model": "...", "size": "...", "seconds": 4, "price": 0.1, "cost": 0.4 } | null }` (and when waiting: `{ "video_id": "...", "assets": [...], "pricing": ... }`).595596### openai-videos-remix597598Create a remix job from an existing `video_id` (`videos.remix`).599600Arguments (input schema):601602- `video_id` (string, required)603- `prompt` (string, required)604- `wait_for_completion`, `timeout_ms`, `poll_interval_ms`, `download_variants`, `tool_result` — same semantics as `openai-videos-create` (default wait is true).605606### openai-videos-list607608List video jobs (`videos.list`).609610Arguments (input schema):611612- `after` (string, optional) — cursor (video id) to list after.613- `limit` (integer, optional)614- `order` ("asc" | "desc", optional)615616Output:617618- `structuredContent`: OpenAI list response shape `{ data, has_more, last_id }`.619- `content`: a text block with serialized JSON.620621### openai-videos-retrieve622623Retrieve job status (`videos.retrieve`).624625- `video_id` (string, required)626627### openai-videos-delete628629Delete a video job (`videos.delete`).630631- `video_id` (string, required)632633### openai-videos-retrieve-content634635Retrieve an asset for a completed job (`videos.downloadContent`, REST `GET /videos/{video_id}/content`), write it under allowed `MEDIA_GEN_DIRS`, and return MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).636637Arguments (input schema):638639- `video_id` (string, required)640- `variant` ("video" | "thumbnail" | "spritesheet", default: "video")641- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)642643Output (MCP CallToolResult):644645- `structuredContent`: OpenAI `Video` object.646- `content`: a `resource_link` (or embedded `resource`), a summary JSON block `{ video_id, variant, uri, pricing }`, plus the full video JSON.647648### google-videos-generate649650Create a Google video generation operation using the Google GenAI SDK (`@google/genai`) `ai.models.generateVideos`.651652Arguments (input schema):653654- `prompt` (string, optional)655- `input_reference` (string, optional) — image-to-video input (HTTP(S) URL, base64/data URL, or file path under `MEDIA_GEN_DIRS`)656- `input_reference_mime_type` (string, optional) — override for `input_reference` MIME type (must be `image/*`)657- `input_video_reference` (string, optional) — video-extension input (HTTP(S) URL or file path under `MEDIA_GEN_DIRS`; mutually exclusive with `input_reference`)658- `model` (string, default: `"veo-3.1-generate-001"`)659- `number_of_videos` (integer, default: `1`)660- `aspect_ratio` (`"16:9" | "9:16"`, optional)661- `duration_seconds` (integer, optional)662 - Veo 2 models: 5–8 seconds (default: 8)663 - Veo 3 models: 4, 6, or 8 seconds (default: 8)664 - When using `referenceImages`: 8 seconds665- `person_generation` (`"DONT_ALLOW" | "ALLOW_ADULT" | "ALLOW_ALL"`, optional)666- `wait_for_completion` (boolean, default: `true`)667- `timeout_ms` (integer, default: `900000`)668- `poll_interval_ms` (integer, default: `10000`)669- `download_when_done` (boolean, optional; defaults to `true` when waiting)670- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)671 - Controls `content[]` shape when downloading generated videos.672- `response_format` (`"url"` | `"b64_json"`, default: `"url"`)673 - Controls `structuredContent.response.generatedVideos[].video` fields:674 - `"url"` prefers `video.uri` (and strips `video.videoBytes`)675 - `"b64_json"` prefers `video.videoBytes` (and strips `video.uri`)676677Requirements:678679- Gemini Developer API: set `GEMINI_API_KEY` (or `GOOGLE_API_KEY`), or `google.api_key` in `secrets.yaml`.680- Vertex AI: set `GOOGLE_GENAI_USE_VERTEXAI=true`, `GOOGLE_CLOUD_PROJECT`, and `GOOGLE_CLOUD_LOCATION` (or `google.vertex_ai.*` in `secrets.yaml`).681682Output:683684- `structuredContent`: Google operation object (includes `name`, `done`, and `response.generatedVideos[]` when available).685- `content`: status text, optional `.mp4` `resource_link` (default) or embedded `resource` blocks (when downloaded), plus JSON text blocks for compatibility.686687### google-videos-retrieve-operation688689Retrieve/poll an existing Google video operation (`ai.operations.getVideosOperation`).690691- `operation_name` (string, required)692- `response_format` (`"url"` | `"b64_json"`, default: `"url"`)693694Output:695696- `structuredContent`: Google operation object.697- `content`: JSON text blocks with a short summary + the full operation.698699### google-videos-retrieve-content700701Download `.mp4` content for a completed operation and return file-first MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`).702703- `operation_name` (string, required)704- `index` (integer, default: `0`) — selects `response.generatedVideos[index]`705- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)706- `response_format` (`"url"` | `"b64_json"`, default: `"url"`)707708Recommended workflow:7097101) Call `google-videos-generate` with `wait_for_completion=true` (default) to get the completed operation and downloads; set to false only if you need the operation id immediately.7112) Poll `google-videos-retrieve-operation` until `done=true`.7123) Call `google-videos-retrieve-content` to download an `.mp4` and receive a `resource_link` (or embedded `resource`).713714### fetch-images715716Fetch and process images from URLs or local file paths with optional compression.717718Arguments (input schema):719720- `sources` (string[], optional)721 - Array of image sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry).722 - Min: 1, Max: 20 images.723 - Mutually exclusive with `ids` and `n`.724- `ids` (string[], optional)725 - Array of image IDs to fetch by local filename match under the primary `MEDIA_GEN_DIRS[0]` directory.726 - IDs must be safe (`[A-Za-z0-9_-]` only; no `..`, `*`, `?`, slashes).727 - Matches filenames containing `_{id}_` or `_{id}.` (supports both single outputs and multi-output suffixes like `_1.png`).728 - When `ids` is used, `compression` and `file` are not supported (no new files are created).729 - Mutually exclusive with `sources` and `n`.730- `n` (integer, optional)731 - When set, returns the last N image files from the primary `MEDIA_GEN_DIRS[0]` directory.732 - Files are sorted by modification time (most recently modified first).733 - Mutually exclusive with `sources` and `ids`.734- `compression` (object, optional)735 - `max_size` (integer, optional): Max dimension in pixels. Images larger than this will be resized.736 - `max_bytes` (integer, optional): Target max file size in bytes. Default: 819200 (800KB).737 - `quality` (integer, optional): JPEG/WebP quality 1-100. Default: 85.738 - `format` ("jpeg" | "png" | "webp", optional): Output format. Default: jpeg.739- `response_format` ("url" | "path" | "b64_json", default: "url")740 - Response format: file/URL-based (`url`), local path (`path`), or inline base64 (`b64_json`).741- `tool_result` ("resource_link" | "image", default: "resource_link")742 - Controls `content[]` shape:743 - `"resource_link"` emits ResourceLink items (file/URL-based)744 - `"image"` emits base64 ImageContent blocks745- `file` (string, optional)746 - Base path for output files. If multiple images, index suffix is added.747748Behavior notes:749750- Images are processed in parallel for maximum throughput.751- Compression is **only** applied when `compression` options are provided.752- Compression uses [sharp](https://sharp.pixelplumbing.com/) with iterative quality/size reduction when enabled.753- Partial success: if some sources fail, successful images are still returned with errors listed in the response.754- When `n` is provided, it is only honored when the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_IMAGES` environment variable is set to `true`. Otherwise, the call fails with a validation error.755- Sometimes an MCP client (for example, ChargeGPT) may not wait for a response from `media-gen-mcp` due to a timeout. In creative environments where you need to quickly retrieve the latest `openai-images-generate` / `openai-images-edit` outputs, you can use `fetch-images` with the `n` argument. When the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_IMAGES=true` environment variable is set, `fetch-images` will return the last N files from `MEDIA_GEN_DIRS[0]` even if the original generation or edit operation timed out on the MCP client side.756757### fetch-videos758759Fetch videos from HTTP(S) URLs or local file paths.760761Arguments (input schema):762763- `sources` (string[], optional)764 - Array of video sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry).765 - Min: 1, Max: 20 videos.766 - Mutually exclusive with `ids` and `n`.767- `ids` (string[], optional)768 - Array of video IDs to fetch by local filename match under the primary `MEDIA_GEN_DIRS[0]` directory.769 - IDs must be safe (`[A-Za-z0-9_-]` only; no `..`, `*`, `?`, slashes).770 - Matches filenames containing `_{id}_` or `_{id}.` (supports both single outputs and multi-asset suffixes like `_thumbnail.webp`).771 - When `ids` is used, `file` is not supported (no downloads; returns existing files).772 - Mutually exclusive with `sources` and `n`.773- `n` (integer, optional)774 - When set, returns the last N video files from the primary `MEDIA_GEN_DIRS[0]` directory.775 - Files are sorted by modification time (most recently modified first).776 - Mutually exclusive with `sources` and `ids`.777- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)778 - Controls `content[]` shape:779 - `"resource_link"` emits ResourceLink items (file/URL-based)780 - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob`781- `file` (string, optional)782 - Base path for output files (used when downloading from URLs). If multiple videos are downloaded, an index suffix is added.783784Output:785786- `content`: one `resource_link` (default) or embedded `resource` block per resolved video, plus an optional error summary text block.787- `structuredContent`: `{ data: [{ source, uri, file, mimeType, name, downloaded }], errors?: string[] }`.788789Behavior notes:790791- URL downloads are only allowed when the URL matches `MEDIA_GEN_URLS` (when set).792- When `n` is provided, it is only honored when the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_VIDEOS` environment variable is set to `true`. Otherwise, the call fails with a validation error.793794### fetch-document795796Fetch documents from HTTP(S) URLs or local file paths.797798Arguments (input schema):799800- `sources` (string[])801 - Array of document sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry).802 - Min: 1, Max: 20 documents.803- `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`)804 - Controls `content[]` shape:805 - `"resource_link"` emits ResourceLink items (file/URL-based)806 - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob`807- `file` (string, optional)808 - Base path for output files (used when downloading from URLs). If multiple documents are downloaded, an index suffix is added.809810Output:811812- `content`: one `resource_link` (default) or embedded `resource` block per resolved document, plus an optional error summary text block.813- `structuredContent`: `{ data: [{ source, uri, file, mimeType, name, downloaded }], errors?: string[] }`.814815Behavior notes:816817- URL downloads are only allowed when the URL matches `MEDIA_GEN_URLS` (when set).818- Local paths are validated against `MEDIA_GEN_DIRS` and can be provided as `file://` URLs.819- Default filenames use `output_<time_t>_media-gen__fetch-document_<uuid>.<ext>` when `file` is omitted.820821### test-images822823Debug tool for testing MCP result placement without calling OpenAI API.824825**Enabled only when `MEDIA_GEN_MCP_TEST_SAMPLE_DIR` is set**. The tool reads existing images from this directory and does **not** create new files.826827Arguments (input schema):828829- `response_format` ("url" | "path" | "b64_json", default: "url")830- `result_placement` ("content" | "api" | "structured" | "toplevel" or array of these, optional)831 - Override `MEDIA_GEN_MCP_RESULT_PLACEMENT` for this call.832- `compression` (object, optional)833 - Same logical tuning knobs as `fetch-images`, but using camelCase keys:834- `tool_result` ("resource_link" | "image", default: "resource_link")835 - Controls `content[]` shape:836 - `"resource_link"` emits ResourceLink items (file/URL-based)837 - `"image"` emits base64 ImageContent blocks838 - `maxSize` (integer, optional): max dimension in pixels.839 - `maxBytes` (integer, optional): target max file size in bytes.840 - `quality` (integer, optional): JPEG/WebP quality 1–100.841 - `format` ("jpeg" | "png" | "webp", optional): output format.842843Behavior notes:844845- Reads up to 10 images from the sample directory (no sorting — filesystem order).846- Uses the same result-building logic as `openai-images-generate` and `openai-images-edit` (including `result_placement` overrides).847- When `output == "base64"` and `compression` is provided, sample files are read and compressed **in memory** using `sharp`; original files on disk are never modified.848- Useful for testing how different MCP clients handle various result structures.849850- When `result_placement` includes `"api"`, the tool returns a **mock OpenAI Images API-style object**:851 - Top level: `created`, `data[]`, `background`, `output_format`, `size`, `quality`.852 - For `response_format: "b64_json"` each `data[i]` contains `b64_json`.853 - For `response_format: "path"` each `data[i]` contains `path`.854 - For `response_format: "url"` each `data[i]` contains `url` instead of `b64_json`.855856#### Debug CLI helpers for `test-images`857858For local debugging there are two helper scripts that call `test-images` directly:859860- `npm run test-images` – uses `debug/debug-call.ts` and prints the validated861 `CallToolResult` as seen by the MCP SDK client. Usage:862863 ```sh864 npm run test-images -- [placement] [--response_format url|path|b64_json]865 # examples:866 # npm run test-images -- structured --response_format b64_json867 # npm run test-images -- structured --response_format path868 # npm run test-images -- structured --response_format url869 ```870871- `npm run test-images:raw` – uses `debug/debug-call-raw.ts` and prints the raw872 JSON-RPC `result` (the underlying `CallToolResult` without extra wrapping). Same873 CLI flags as above.874875Both scripts truncate large fields for readability:876877- `image_url` → first 80 characters, then `...(N chars)`;878- `b64_json` and `data` (when it is a base64 string) → first 25 characters, then `...(N chars)`.879880---881882## 🧩 Version policy883884### Semantic Versioning (SemVer)885886This package follows **SemVer**: `MAJOR.MINOR.PATCH` (x.y.z).887888- `MAJOR` — breaking changes (tool names, input schemas, output shapes).889- `MINOR` — new tools or backward-compatible additions (new optional params, new fields in responses).890- `PATCH` — bug fixes and internal refactors with no intentional behavior change.891892Since `1.0.0`, this project follows **standard SemVer rules**: breaking changes bump **MAJOR** (npm’s `^1.0.0` allows `1.x`, but not `2.0.0`).893894### Dependency policy895896This repository aims to stay **closely aligned with current stable releases**:897898- **MCP SDK**: targeting the latest stable `@modelcontextprotocol/sdk` and schema.899- **OpenAI SDK**: regularly updated to the latest stable `openai` package.900- **Zod**: using the Zod 4.x line (currently `^4.1.3`). In this project we previously ran on Zod 3.x and, in combination with the MCP TypeScript SDK typings, hit heavy TypeScript errors when passing `.shape` into `inputSchema` — in particular TS2589 (*"type instantiation is excessively deep and possibly infinite"*) and TS2322 (*schema shape not assignable to `AnySchema | ZodRawShapeCompat`*). We track the upstream discussion in [modelcontextprotocol/typescript-sdk#494](https://github.com/modelcontextprotocol/typescript-sdk/issues/494) and the related Zod typing work in [colinhacks/zod#5222](https://github.com/colinhacks/zod/pull/5222), and keep the stack on a combination that passes **full strict** compilation reliably.901- **Tooling stack** (Node.js, TypeScript, etc.): developed and tested against recent LTS / current releases, with a dedicated `tsconfig-strict.json` that enables all strict TypeScript checks (`strict`, `noUnusedLocals`, `noUnusedParameters`, `exactOptionalPropertyTypes`, `noUncheckedIndexedAccess`, `noPropertyAccessFromIndexSignature`, etc.).902903You are welcome to pin or downgrade Node.js, TypeScript, the OpenAI SDK, Zod, or other pieces of the stack if your environment requires it, but please keep in mind:904905- we primarily test and tune against the latest stack;906- issues that only reproduce on older runtimes / SDK versions may be harder for us to investigate and support;907- upstream compatibility is validated first of all against the latest MCP spec and OpenAI Images API.908909This project is intentionally a bit **futuristic**: it tries to keep up with new capabilities as they appear in MCP and OpenAI tooling (in particular, robust multimodal/image support over MCP and in ChatGPT’s UI). A detailed real‑world bug report and analysis of MCP image rendering in ChatGPT is listed in the **References** section as a case study.910911If you need a long-term-stable stack, pin exact versions in your own fork and validate them carefully in your environment.912913---914915## 🧩 Typed tool callbacks916917All tool handlers use **strongly typed callback parameters** derived from Zod schemas via `z.input<typeof schema>`:918919```typescript920// Schema definition921const openaiImagesGenerateBaseSchema = z.object({922 prompt: z.string().max(32000),923 background: z.enum(["transparent", "opaque", "auto"]).optional(),924 // ... more fields925});926927// Type alias928type OpenAIImagesGenerateArgs = z.input<typeof openaiImagesGenerateBaseSchema>;929930// Strictly typed callback931server.registerTool(932 "openai-images-generate",933 { inputSchema: openaiImagesGenerateBaseSchema.shape, ... },934 async (args: OpenAIImagesGenerateArgs, _extra: unknown) => {935 const validated = openaiImagesGenerateSchema.parse(args);936 // ... handler logic937 },938);939```940941This pattern provides:942943- **Static type safety** — IDE autocomplete and compile-time checks for all input fields.944- **Runtime validation** — Zod `.parse()` ensures all inputs match the schema before processing.945- **MCP SDK compatibility** — `inputSchema: schema.shape` provides the JSON Schema for tool registration.946947All tools (`openai-images-*`, `openai-videos-*`, `fetch-images`, `fetch-videos`, `fetch-document`, `test-images`) follow this pattern.948949---950951## 🧩 Tool annotations952953This MCP server exposes the following tools with annotation hints:954955| Tool | `readOnlyHint` | `destructiveHint` | `idempotentHint` | `openWorldHint` |956|------|----------------|-------------------|------------------|-----------------|957| **openai-images-generate** | `true` | `false` | `false` | `true` |958| **openai-images-edit** | `true` | `false` | `false` | `true` |959| **openai-videos-create** | `true` | `false` | `false` | `true` |960| **openai-videos-remix** | `true` | `false` | `false` | `true` |961| **openai-videos-list** | `true` | `false` | `false` | `true` |962| **openai-videos-retrieve** | `true` | `false` | `false` | `true` |963| **openai-videos-delete** | `true` | `false` | `false` | `true` |964| **openai-videos-retrieve-content** | `true` | `false` | `false` | `true` |965| **fetch-images** | `true` | `false` | `false` | `false` |966| **fetch-videos** | `true` | `false` | `false` | `false` |967| **fetch-document** | `true` | `false` | `false` | `false` |968| **test-images** | `true` | `false` | `false` | `false` |969970These hints help MCP clients understand that these tools:971- may invoke external APIs or read external resources (open world),972- do not modify existing project files or user data; they only create new media files (images/videos/documents) in configured output directories,973- may produce different outputs on each call, even with the same inputs.974975Because `readOnlyHint` is set to `true` for most tools, MCP platforms (including chatgpt.com) can treat this server as logically read-only and usually will not show "this tool can modify your files" warnings.976977---978979## 📁 Project structure980981```text982media-gen-mcp/983├── src/984│ ├── index.ts # MCP server entry point985│ └── lib/986│ ├── compression.ts # Image compression (sharp)987│ ├── env.ts # Env parsing + allowlists (+ glob support)988│ ├── helpers.ts # URL/path validation, result building989│ ├── logger.ts # Structured logging + truncation helpers990│ └── schemas.ts # Zod schemas for all tools991├── test/992│ ├── compression.test.ts # 12 tests993│ ├── env.test.ts # 19 tests994│ ├── fetch-images.integration.test.ts# 2 tests995│ ├── fetch-videos.integration.test.ts# 2 tests996│ ├── helpers.test.ts # 31 tests997│ ├── logger.test.ts # 10 tests998│ └── schemas.test.ts # 64 tests999├── debug/ # Local debug helpers (MCP client scripts)1000├── plan/ # Design notes / plans1001├── dist/ # Compiled output1002├── tsconfig.json1003├── vitest.config.ts1004├── package.json1005├── CHANGELOG.md1006├── README.md1007└── AGENTS.md1008```10091010---10111012## 📝 License10131014MIT10151016---10171018## 🩺 Troubleshooting10191020- Make sure your `OPENAI_API_KEY` is valid and has image API access.1021- You must have a [verified OpenAI organization](https://platform.openai.com/account/organization). After verifying, it can take 15–20 minutes for image API access to activate.1022- File paths [optional param] must be absolute.1023 - **Unix/macOS/Linux**: Starting with `/` (e.g., `/path/to/image.png`)1024 - **Windows**: Drive letter followed by `:` (e.g., `C:/path/to/image.png` or `C:\path\to\image.png`)1025 - For file output, ensure the target directory is writable.1026 - If you see errors about file types, check your image file extensions and formats.10271028---10291030## 🙏 Inspiration10311032This server was originally inspired by1033[SureScaleAI/openai-gpt-image-mcp](https://github.com/SureScaleAI/openai-gpt-image-mcp),1034but is now a separate implementation focused on **closely tracking the official1035specifications**:10361037- **OpenAI Images API alignment** – The arguments for `openai-images-generate`1038 and `openai-images-edit` mirror1039 [`images.create` / `gpt-image-1.5`](https://platform.openai.com/docs/api-reference/images/create):1040 `prompt`, `n`, `size`, `quality`, `background`, `output_format`,1041 `output_compression`, `user`, plus `response_format` (`url` / `b64_json`) with1042 the same semantics as the OpenAI Images API.1043- **MCP Tool Result alignment (image + resource_link)** – With1044 `result_placement = "content"`, the server follows the MCP **5.2 Tool Result**1045 section1046 ([5.2.2 Image Content](https://modelcontextprotocol.io/specification/2025-11-25/server/tools#image-content),1047 [5.2.4 Resource Links](https://modelcontextprotocol.io/specification/2025-11-25/server/tools#tool-result))1048 and emits strongly-typed `content[]` items:1049 - `{ "type": "image", "data": "<base64>", "mimeType": "image/png" }` for1050 `response_format = "b64_json"`;1051 - `{ "type": "resource_link", "uri": "file:///..." | "https://...", "name": "...", "mimeType": "image/..." }`1052 for file/URL-based output.1053- **Raw OpenAI-style API output** – With `result_placement = "api"`, the tool1054 result itself **is** an OpenAI Images-style object:1055 `{ created, data: [...], background, output_format, size, quality, usage? }`,1056 where each `data[]` entry contains either `b64_json` (for1057 `response_format = "b64_json"`) or `url` (for `response_format = "url"`). No1058 MCP wrapper fields (`content`, `structuredContent`, `files`, `urls`) are1059 added in this mode.10601061In short, this library:10621063- tracks the OpenAI Images API for **arguments and result shape** when1064 `result_placement = "api"` with `response_format = "url" | "b64_json"`, and1065- follows the MCP specification for **tool result content blocks** (`image`,1066 `resource_link`, `text`) when `result_placement = "content"`.10671068### Recommended presets for common clients10691070- **Default mode / Claude Desktop / strict MCP clients**1071 For clients that strictly follow the MCP spec, the recommended (and natural)1072 configuration is:1073 - `result_placement = content`1074 - `response_format = b64_json`10751076 In this mode the server returns:1077 - `content[]` with `type: "image"` (base64 image data) and1078 `type: "resource_link"` (file/URL links), matching MCP section 5.2 (Image1079 Content and Resource Links). This output works well for **direct1080 integration** with Claude Desktop and any client that fully implements the1081 2025‑11‑25 spec.10821083- **chatgpt.com Developer Mode**1084 For running this server as an MCP backend behind ChatGPT Developer Mode, the1085 most practical configuration is the one that most closely matches the OpenAI1086 Images API:1087 - `result_placement = api`1088 - `response_format = url`10891090 In this mode the tool result matches the `images.create` / `gpt-image-1.5`1091 format (including `data[].url`), which simplifies consumption from backends1092 and libraries that expect the OpenAI schema.10931094 However, **even with this OpenAI-native shape, the chatgpt.com client does1095 not currently render images**. This behavior is documented in detail in the1096 following report:1097 <https://github.com/strato-space/report/issues/1>1098---10991100## ⚠️ Limitations & Large File Handling11011102- **Configurable payload safeguard:** By default this server uses a ~50MB budget (52,428,800 bytes) for inline `content` to stay within typical MCP client limits. You can override this threshold by setting the `MCP_MAX_CONTENT_BYTES` environment variable to a higher (or lower) value.1103- **Auto-Switch to File Output:** If the total image base64 size exceeds the configured threshold, the tool automatically saves images to disk and returns file path(s) via `resource_link` instead of inline base64. This helps avoid client-side "payload too large" errors while still delivering full-resolution images.1104- **Default File Location:** If you do not specify a `file` path, outputs are saved under `MEDIA_GEN_DIRS[0]` (default: `/tmp/media-gen-mcp`) using names like `output_<time_t>_media-gen__<tool>_<id>.<ext>`.1105- **Environment Variables:**1106 - `MEDIA_GEN_DIRS`: Set this to control where outputs are saved. Example: `export MEDIA_GEN_DIRS=/your/desired/dir`. This directory may coincide with your public static directory if you serve files directly from it.1107 - `MEDIA_GEN_MCP_URL_PREFIXES`: Optional comma-separated HTTPS prefixes for public URLs, matched positionally to `MEDIA_GEN_DIRS` entries. When set, the server builds public URLs as `<prefix>/<relative_path_inside_root>` and returns them alongside file paths (for example via `resource_link` URIs and `structuredContent.data[].url` when `response_format: "url"`). Example: `export MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media,https://media-gen.example.com/samples`1108 - **Best Practice:** For large or production images, always use file output and ensure your client is configured to handle file paths. Configure `MEDIA_GEN_DIRS` and (optionally) `MEDIA_GEN_MCP_URL_PREFIXES` to serve images via a public web server (e.g., nginx).11091110---11111112## 🌐 Serving generated files over HTTPS11131114If you want ChatGPT (or any MCP client) to mention publicly accessible URLs alongside file paths:111511161. Expose your image directory via HTTPS. For example, on nginx:11171118 ```nginx1119 server {1120 # listen 443 ssl http2;1121 # server_name <server_name>;11221123 # ssl_certificate <path>;1124 # ssl_certificate_key <path>;11251126 location /media/ {1127 alias /home/username/media-gen-mcp/media/;1128 autoindex off;1129 expires 7d;1130 add_header Cache-Control "public, immutable";1131 }1132 }1133 ```113411352. Ensure the first entry in `MEDIA_GEN_DIRS` points to the same directory (e.g. `MEDIA_GEN_DIRS=/home/username/media-gen-mcp/media/` or `MEDIA_GEN_DIRS=media/` when running from the project root).11363. Set `MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media` so the server returns matching HTTPS URLs in top-level `urls`, `resource_link` URIs, and `image_url` fields (for `response_format: "url"`).11371138Both `openai-images-generate` and `openai-images-edit` now attach `files` + `urls` for **base64** and **file** response modes, allowing clients to reference either the local filesystem path or the public HTTPS link. This is particularly useful while ChatGPT cannot yet render MCP image blocks inline.11391140---11411142## 📚 References11431144- **Model Context Protocol**1145 - [MCP Specification](https://modelcontextprotocol.io/docs/getting-started/intro)1146 - [MCP Schema (2025-11-25)](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/2025-11-25/schema.json)11471148- **OpenAI Images**1149 - [Images API overview](https://platform.openai.com/docs/api-reference/images)1150 - [Images generate (gpt-image-1.5)](https://platform.openai.com/docs/api-reference/images/create)1151 - [Images edit (`createEdit`)](https://platform.openai.com/docs/api-reference/images/createEdit)1152 - [Tools guide: image generation & revised_prompt](https://platform.openai.com/docs/guides/tools-image-generation)11531154- **OpenAI Videos**1155 - [Videos API overview](https://platform.openai.com/docs/api-reference/videos)11561157- **Case studies**1158 - [MCP image rendering in ChatGPT (GitHub issue)](https://github.com/strato-space/report/issues/1)1159 - **Symptoms:** ChatGPT often ignored or mishandled MCP `image` content blocks: empty tool results, raw base64 treated as text (huge token usage), or generic "I can't see the image" responses, while other MCP clients (Cursor, Claude) rendered the same images correctly.1160 - **Root cause:** not a problem with the MCP spec itself, but with ChatGPT's handling/serialization of MCP `CallToolResult` image content blocks and media objects (especially around UI rendering and nested containers).1161 - **Status & workarounds:** OpenAI has begun rolling out fixes for MCP image support in Codex/ChatGPT, but behavior is still inconsistent; this server uses file/resource_link + URL patterns and spec‑conformant `image` blocks so that tools remain usable across current and future MCP clients.11621163---11641165## 🙏 Credits11661167- Built with [@modelcontextprotocol/sdk](https://www.npmjs.com/package/@modelcontextprotocol/sdk)1168- Uses [openai](https://www.npmjs.com/package/openai) Node.js SDK1169- Refactoring and MCP spec alignment assisted by [Windsurf](https://windsurf.com) and [GPT-5 High Reasoning](https://openai.com).1170
Full transparency — inspect the skill content before installing.