Media Gen MCP is a strict TypeScript Model Context Protocol (MCP) server for OpenAI Images (gpt-image-1.5, gpt-image-1), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart resourcelink vs inline image outputs and optional sharp processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Wo
Add this skill
npx mdskills install strato-space/media-gen-mcpComprehensive MCP server with full OpenAI Images/Sora and Google Veo support, strict TypeScript, and extensive testing coverage
Media Gen MCP is a strict TypeScript Model Context Protocol (MCP) server for OpenAI Images (gpt-image-1.5, gpt-image-1), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart resource_link vs inline image outputs and optional sharp processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Works with fast-agent, Claude Desktop, ChatGPT, Cursor, VS Code, Windsurf, and any MCP-compatible client.
Design principle: spec-first, type-safe image tooling – strict OpenAI Images API + MCP compliance with fully static TypeScript types and flexible result placements/response formats for different clients.
gpt-image-1.5 model (with gpt-image-1 compatibility and DALL·E support planned in future versions).sora-2, sora-2-pro) with job create/remix/list/retrieve/delete and asset downloads.resource_link/resource outputs.test-images tool that mirrors production result placement (content, structuredContent, toplevel).Strict MCP spec support
Tool outputs are first-class CallToolResult objects from the latest MCP schema, including:
content items (text, image, resource_link, resource), optional structuredContent, optional top-level files, and the isError flag for failures.
Full gpt-image-1.5 and sora-2/sora-2-pro parameters coverage (generate & edit)
openai-images-generate mirrors the OpenAI Images create API for gpt-image-1.5 (and gpt-image-1) (background, moderation, size, quality, output_format, output_compression, n, user, etc.).openai-images-edit mirrors the OpenAI Images createEdit API for gpt-image-1.5 (and gpt-image-1) (image, mask, n, quality, size, user).OpenAI Videos (Sora) job tooling (create / remix / list / retrieve / delete / content)
openai-videos-create mirrors videos/create and can optionally wait for completion.openai-videos-remix mirrors videos/remix.openai-videos-list mirrors videos/list.openai-videos-retrieve mirrors videos/retrieve.openai-videos-delete mirrors videos/delete.openai-videos-retrieve-content mirrors videos/content and downloads video / thumbnail / spritesheet assets to disk, returning MCP resource_link (default) or embedded resource blocks (via tool_result).Google GenAI (Veo) operations + downloads (generate / retrieve operation / retrieve content)
google-videos-generate starts a long-running operation (ai.models.generateVideos) and can optionally wait for completion and download .mp4 outputs. Veo model referencegoogle-videos-retrieve-operation polls an existing operation.google-videos-retrieve-content downloads an .mp4 from a completed operation, returning MCP resource_link (default) or embedded resource blocks (via tool_result).Fetch and process images from URLs or files
fetch-images tool loads images from HTTP(S) URLs or local file paths with optional, user-controlled compression (disabled by default). Supports parallel processing of up to 20 images.
Fetch videos from URLs or files
fetch-videos tool lists local videos or downloads remote video URLs to disk and returns MCP resource_link (default) or embedded resource blocks (via tool_result).
Fetch documents from URLs or files
fetch-document tool downloads remote files or reuses local paths and returns MCP resource_link (default) or embedded resource blocks (via tool_result).
Mix and edit up to 16 images
openai-images-edit accepts image as a single string or an array of 1–16 file paths/base64 strings, matching the OpenAI spec for GPT Image models (gpt-image-1.5, gpt-image-1) image edits.
Smart image compression
Built-in compression using sharp — iteratively reduces quality and dimensions to fit MCP payload limits while maintaining visual quality.
Resource-aware file output with resource_link
file when the total response size exceeds a safe threshold.output__media-gen___. filenames (images/documents use a generated UUID; videos use the OpenAI video_id) and exposed to MCP clients via content[] depending on tool_result (resource_link/image for images, resource_link/resource for video/document downloads).Built-in test-images tool for MCP client debugging
test-images reads sample images from a configured directory and returns them using the same result-building logic as production tools. Use tool_result and response_format parameters to test how different MCP clients handle content[] and structuredContent.
Structured MCP error handling
All tool errors (validation, OpenAI API failures, I/O) are returned as MCP errors with
isError: true and content: [{ type: "text", text: }], making failures easy to parse and surface in MCP clients.
git clone https://github.com/strato-space/media-gen-mcp.git
cd media-gen-mcp
npm install
npm run build
Build modes:
npm run build – strict TypeScript build with all strict flags enabled, including skipLibCheck: false. Incremental builds via .tsbuildinfo (~2-3s on warm cache).npm run esbuild – fast bundling via esbuild (no type checking, useful for rapid iteration).For development or when TypeScript compilation fails due to memory constraints:
npm run dev # Uses tsx to run TypeScript directly
npm run lint # ESLint with typescript-eslint
npm run typecheck # Strict tsc --noEmit
npm run test # Unit tests (vitest)
npm run test:watch # Watch mode for TDD
npm run ci # lint + typecheck + test
The project uses vitest for unit testing. Tests are located in test/.
Covered modules:
| Module | Tests | Description |
|---|---|---|
compression | 12 | Image format detection, buffer processing, file I/O |
helpers | 31 | URL/path validation, output resolution, result placement, resource links |
env | 19 | Configuration parsing, env validation, defaults |
logger | 10 | Structured logging + truncation safety |
pricing | 5 | Sora pricing estimate helpers |
schemas | 69 | Zod schema validation for all tools, type inference |
fetch-images (integration) | 3 | End-to-end MCP tool call behavior |
fetch-videos (integration) | 3 | End-to-end MCP tool call behavior |
Test categories:
isCompressionAvailable, detectImageFormat, processBufferWithCompression, readAndProcessImageisHttpUrl, isAbsolutePath, isBase64Image, ensureDirectoryWritable, resolveOutputPath, getResultPlacement, buildResourceLinksMEDIA_GEN_* / MEDIA_GEN_MCP_* settingsopenai-images-*, openai-videos-*, fetch-images, fetch-videos, test-images inputs, boundary testing (prompt length, image count limits, path validation)npm run test
# ✓ test/compression.test.ts (12 tests)
# ✓ test/helpers.test.ts (31 tests)
# ✓ test/env.test.ts (19 tests)
# ✓ test/logger.test.ts (10 tests)
# ✓ test/pricing.test.ts (5 tests)
# ✓ test/schemas.test.ts (69 tests)
# ✓ test/fetch-images.integration.test.ts (3 tests)
# ✓ test/fetch-videos.integration.test.ts (3 tests)
# Tests: 152 passed
You can also run the server straight from a remote repo using npx:
npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env
The --env-file argument tells the server which env file to load (e.g. when you keep secrets outside the cloned directory). The file should contain OPENAI_API_KEY, optional Azure variables, and any MEDIA_GEN_MCP_* settings.
secrets.yaml (optional)You can keep API keys (and optional Google Vertex AI settings) in a secrets.yaml file (compatible with the fast-agent secrets template):
openai:
api_key:
anthropic:
api_key:
google:
api_key:
vertex_ai:
enabled: true
project_id: your-gcp-project-id
location: europe-west4
media-gen-mcp loads secrets.yaml from the current working directory (or from --secrets-file /path/to/secrets.yaml) and applies it to env vars; values in secrets.yaml override env, and `` placeholders are ignored.
In fast-agent, MCP servers are configured in fastagent.config.yaml under the mcp.servers section (see the fast-agent docs).
To add media-gen-mcp from GitHub via npx as an MCP server:
# fastagent.config.yaml
mcp:
servers:
# your existing servers (e.g. fetch, filesystem, huggingface, ...)
media-gen-mcp:
command: "npx"
args: ["-y", "github:strato-space/media-gen-mcp", "--env-file", "/path/to/media-gen.env"]
Put OPENAI_API_KEY and other settings into media-gen.env (see .env.sample in this repo).
Add an MCP server that runs media-gen-mcp from GitHub via npx using the JSON format below (similar to Claude Desktop / VS Code):
{
"mcpServers": {
"media-gen-mcp": {
"command": "npx",
"args": ["-y", "github:strato-space/media-gen-mcp", "--env-file", "/path/to/media-gen.env"]
}
}
}
Add to your MCP client config (fast-agent, Windsurf, Claude Desktop, Cursor, VS Code):
{
"mcpServers": {
"media-gen-mcp": {
"command": "npx",
"args": ["-y", "github:strato-space/media-gen-mcp"],
"env": { "OPENAI_API_KEY": "sk-..." }
}
}
}
Also supports Azure deployments:
{
"mcpServers": {
"media-gen-mcp": {
"command": "npx",
"args": ["-y", "github:strato-space/media-gen-mcp"],
"env": {
// "AZURE_OPENAI_API_KEY": "sk-...",
// "AZURE_OPENAI_ENDPOINT": "my.endpoint.com",
"OPENAI_API_VERSION": "2024-12-01-preview"
}
}
}
}
Environment variables:
OPENAI_API_KEY (and optionally AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, OPENAI_API_VERSION) in the environment of the process that runs node dist/index.js (shell, systemd unit, Docker env, etc.)..env file from its working directory if present (it does not override already-set environment variables).--env-file /path/to/env when starting the server (including via npx); this file is loaded via dotenv before tools run, again without overriding already-set variables.To avoid flooding logs with huge image payloads, the built-in logger applies a
log-only sanitizer to structured data passed to log.debug/info/warn/error:
b64_json, base64, string
data, image_url) to a short preview controlled by
LOG_TRUNCATE_DATA_MAX (default: 64 characters). The list of keys defaults
to LOG_SANITIZE_KEYS inside src/lib/logger.ts and can be overridden via
MEDIA_GEN_MCP_LOG_SANITIZE_KEYS (comma-separated list of field names).Control via environment:
MEDIA_GEN_MCP_LOG_SANITIZE_IMAGES (default: true)
1, true, yes, on – enable truncation (default behaviour).0, false, no, off – disable truncation and log full payloads.Field list and limits are configured in src/lib/logger.ts via
LOG_SANITIZE_KEYS and LOG_TRUNCATE_DATA_MAX.
MEDIA_GEN_DIRS. If unset, defaults to /tmp/media-gen-mcp (or %TEMP%/media-gen-mcp on Windows).MEDIA_GEN_MCP_TEST_SAMPLE_DIR adds a directory to the allowlist and enables the test-images tool.fetch-images and fetch-document accept file paths (absolute or relative). Relative paths are resolved against the first MEDIA_GEN_DIRS entry and must still match an allowed pattern.MEDIA_GEN_URLS patterns. Empty = allow all.openai-images-generate, openai-images-edit, fetch-images, fetch-videos, and fetch-document write under the first entry of MEDIA_GEN_DIRS. test-images is read-only and does not create new files.Both MEDIA_GEN_DIRS and MEDIA_GEN_URLS support glob wildcards:
| Pattern | Matches | Example |
|---|---|---|
* | Any single segment (no /) | /home/*/media/ matches /home/user1/media/ |
** | Any number of segments | /data/**/images/ matches /data/a/b/images/ |
URL examples:
MEDIA_GEN_URLS=https://*.cdn.example.com/,https://storage.example.com/**/assets/
Path examples:
MEDIA_GEN_DIRS=/home/*/media-gen/output/,/data/**/images/
⚠️ Warning: Trailing wildcards without a delimiter (e.g., /home/user/* or https://cdn.com/**) expose entire subtrees and trigger a console warning at startup.
* in home directories or system paths.MEDIA_GEN_URLS prefixes for remote fetches.tool_result and response_formatImage tools (openai-images-*, fetch-images, test-images) support two parameters that control the shape of the MCP tool result:
| Parameter | Values | Default | Description |
|---|---|---|---|
tool_result | resource_link, image | resource_link | Controls content[] shape |
response_format | url, path, b64_json | url | Controls structuredContent shape (OpenAI ImagesResponse format) |
Video/document download tools (openai-videos-create / openai-videos-remix when downloading, openai-videos-retrieve-content, google-videos-generate when downloading, google-videos-retrieve-content, fetch-videos, fetch-document) support:
| Parameter | Values | Default | Description |
|---|---|---|---|
tool_result | resource_link, resource | resource_link | Controls content[] shape |
Google video tools (google-videos-*) also support:
| Parameter | Values | Default | Description |
|---|---|---|---|
response_format | url, b64_json | url | Controls structuredContent.response.generatedVideos[].video shape (uri vs videoBytes) |
tool_result — controls content[]openai-images-*, fetch-images, test-images)
resource_link (default): Emits ResourceLink items with file:// or https:// URIsimage: Emits base64 ImageContent blocksresource_link (default): Emits ResourceLink items with file:// or https:// URIsresource: Emits EmbeddedResource blocks with base64 resource.blobfetch-document)
resource_link (default): Emits ResourceLink items with file:// or https:// URIsresource: Emits EmbeddedResource blocks with base64 resource.blobresponse_format — controls structuredContentFor OpenAI images, structuredContent always contains an OpenAI ImagesResponse-style object:
{
"created": 1234567890,
"data": [
{ "url": "https://..." } // or { "path": "/abs/path.png" } / { "b64_json": "..." } depending on response_format
]
}
url (default): data[].url contains file URLspath: data[].path contains local filesystem pathsb64_json: data[].b64_json contains base64-encoded image dataFor Google videos, response_format controls whether structuredContent.response.generatedVideos[].video prefers:
url (default): video.uri (and strips video.videoBytes)b64_json: video.videoBytes (and strips video.uri)Per MCP spec 5.2.6, a TextContent block with serialized JSON (always using URLs in data[]) is also included in content[] for backward compatibility with clients that don't support structuredContent.
Example tool result structure:
{
"content": [
// ResourceLink or ImageContent based on tool_result
{ "type": "resource_link", "uri": "https://...", "name": "image.png", "mimeType": "image/png" },
// Serialized JSON for backward compatibility (MCP 5.2.6)
{ "type": "text", "text": "{ \"created\": 1234567890, \"data\": [{ \"url\": \"https://...\" }] }" }
],
"structuredContent": {
"created": 1234567890,
"data": [{ "url": "https://..." }]
}
}
ChatGPT MCP client behavior (chatgpt.com, as of 2025-12-01):
content[] image data in favor of structuredContent.response_format: "url" and configure the first MEDIA_GEN_MCP_URL_PREFIXES entry as a public HTTPS prefix (for example MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media).For Anthropic clients (Claude Desktop, etc.), the default configuration works well.
For networked SSE access you can front media-gen-mcp with mcp-proxy or its equivalent. This setup has been tested with the TypeScript SSE proxy implementation punkpeye/mcp-proxy.
For example, a one-line command looks like:
mcp-proxy --host=0.0.0.0 --port=99 --server=sse --sseEndpoint=/ --shell 'npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env'
In production you would typically wire this up via a systemd template unit that loads PORT/SHELL_CMD from an EnvironmentFile= (see server/mcp/mcp@.service style setups).
Arguments (input schema):
prompt (string, required)
background ("transparent" | "opaque" | "auto", optional)
background is "transparent", then output_format must be "png" or "webp".model ("gpt-image-1.5" | "gpt-image-1", optional, default: "gpt-image-1.5")moderation ("auto" | "low", optional)
n (integer, optional)
output_compression (integer, optional)
output_format is "jpeg" or "webp".output_format ("png" | "jpeg" | "webp", optional)
quality ("auto" | "high" | "medium" | "low", default: "high")size ("1024x1024" | "1536x1024" | "1024x1536" | "auto", default: "1024x1536")user (string, optional)
response_format ("url" | "path" | "b64_json", default: "url")
"url": file/URL-based output (resource_link items, image_url fields, data[].url in api placement)."path": local filesystem paths in data[].path (for local skill workflows)."b64_json": inline base64 image data (image content, data[].b64_json in api placement).tool_result ("resource_link" | "image", default: "resource_link")
content[] shape:
"resource_link" emits ResourceLink items (file/URL-based)"image" emits base64 ImageContent blocksBehavior notes:
gpt-image-1.5 by default (set model: "gpt-image-1" for legacy behavior).MCP_MAX_CONTENT_BYTES), the server
automatically switches the effective output mode to file/URL-based and saves
images to the first entry of MEDIA_GEN_DIRS (default: /tmp/media-gen-mcp).response_format: "b64_json", the server still writes
the files to disk (for static hosting, caching, or later reuse). Exposure of
file paths / URLs in the tool result then depends on MEDIA_GEN_MCP_RESULT_PLACEMENT
and per-call result_placement (see section below).Output (MCP CallToolResult, when placement includes "content"):
output mode is "base64":
content is an array that may contain:
{ type: "image", data: , mimeType: }{ type: "text", text: }output mode is "file":
content contains one resource_link item per file, plus the same optional text items with revised prompts:
{ type: "resource_link", uri: "file:///absolute-path-1.png", name: "absolute-path-1.png", mimeType: }gpt-image-1.5 and gpt-image-1, an additional text line is included with a pricing estimate (based on structuredContent.usage), and structuredContent.pricing contains the full pricing breakdown.When result_placement includes "api", openai-images-generate instead returns an OpenAI Images API-like object without MCP wrappers:
{
"created": 1764599500,
"data": [
{ "b64_json": "..." } // or { "url": "https://.../media/file.png" } when output: "file"
],
"background": "opaque",
"output_format": "png",
"size": "1024x1024",
"quality": "high"
}
Arguments (input schema):
image (string or string[], required)
.png, .jpg, .jpeg, .webp),
a base64-encoded image string (optionally as a data:image/...;base64,... URL),
or an HTTP(S) URL pointing to a publicly accessible image,
or an array of 1–16 such strings (for multi-image editing).prompt (string, required)
mask (string, optional)
{ type: "text", text: }output mode is "file":
content contains one resource_link item per file, plus the same optional text items with revised prompts:
{ type: "resource_link", uri: "file:///absolute-path-1.png", name: "absolute-path-1.png", mimeType: "image/png" }gpt-image-1.5 and gpt-image-1, an additional text line is included with a pricing estimate (based on structuredContent.usage), and structuredContent.pricing contains the full pricing breakdown.When result_placement includes "api", openai-images-edit follows the same raw API format as openai-images-generate (top-level created, data[], background, output_format, size, quality with b64_json for base64 output or url for file output).
Error handling (both tools):
isError: truecontent: [{ type: "text", text: }]Create a video generation job using the OpenAI Videos API (videos.create).
Arguments (input schema):
prompt (string, required) — text prompt describing the video (max 32K chars).input_reference (string, optional) — optional image reference (HTTP(S) URL, base64/data URL, or file path).input_reference_fit ("match" | "cover" | "contain" | "stretch", default: "contain")
input_reference to the requested video size:
match: require exact dimensions (fails fast on mismatch)cover: resize + center-crop to fillcontain: resize + pad/letterbox to fit (default)stretch: resize with distortioninput_reference_background ("blur" | "black" | "white" | "#RRGGBB" | "#RRGGBBAA", default: "blur")
input_reference_fit="contain".model ("sora-2" | "sora-2-pro", default: "sora-2-pro")seconds ("4" | "8" | "12", optional)size ("720x1280" | "1280x720" | "1024x1792" | "1792x1024", optional)
1024x1792 and 1792x1024 require sora-2-pro.input_reference is omitted and size is omitted, the API default is used.wait_for_completion (boolean, default: true)
openai-videos-retrieve until completed or failed (or timeout), then downloads assets.timeout_ms (integer, default: 900000)poll_interval_ms (integer, default: 2000)download_variants (string[], default: ["video"])
"video" | "thumbnail" | "spritesheet".tool_result ("resource_link" | "resource", default: "resource_link")
content[] shape for downloaded assets:
"resource_link" emits ResourceLink items (file/URL-based)"resource" emits EmbeddedResource blocks with base64 resource.blobOutput (MCP CallToolResult):
structuredContent: OpenAI Video object (job metadata; final state when wait_for_completion=true).content: includes resource_link (default) or embedded resource blocks for downloaded assets (when requested) and text blocks with JSON.
{ "video_id": "...", "pricing": { "currency": "USD", "model": "...", "size": "...", "seconds": 4, "price": 0.1, "cost": 0.4 } | null } (and when waiting: { "video_id": "...", "assets": [...], "pricing": ... }).Create a remix job from an existing video_id (videos.remix).
Arguments (input schema):
video_id (string, required)prompt (string, required)wait_for_completion, timeout_ms, poll_interval_ms, download_variants, tool_result — same semantics as openai-videos-create (default wait is true).List video jobs (videos.list).
Arguments (input schema):
after (string, optional) — cursor (video id) to list after.limit (integer, optional)order ("asc" | "desc", optional)Output:
structuredContent: OpenAI list response shape { data, has_more, last_id }.content: a text block with serialized JSON.Retrieve job status (videos.retrieve).
video_id (string, required)Delete a video job (videos.delete).
video_id (string, required)Retrieve an asset for a completed job (videos.downloadContent, REST GET /videos/{video_id}/content), write it under allowed MEDIA_GEN_DIRS, and return MCP resource_link (default) or embedded resource blocks (via tool_result).
Arguments (input schema):
video_id (string, required)variant ("video" | "thumbnail" | "spritesheet", default: "video")tool_result ("resource_link" | "resource", default: "resource_link")Output (MCP CallToolResult):
structuredContent: OpenAI Video object.content: a resource_link (or embedded resource), a summary JSON block { video_id, variant, uri, pricing }, plus the full video JSON.Create a Google video generation operation using the Google GenAI SDK (@google/genai) ai.models.generateVideos.
Arguments (input schema):
prompt (string, optional)input_reference (string, optional) — image-to-video input (HTTP(S) URL, base64/data URL, or file path under MEDIA_GEN_DIRS)input_reference_mime_type (string, optional) — override for input_reference MIME type (must be image/*)input_video_reference (string, optional) — video-extension input (HTTP(S) URL or file path under MEDIA_GEN_DIRS; mutually exclusive with input_reference)model (string, default: "veo-3.1-generate-001")number_of_videos (integer, default: 1)aspect_ratio ("16:9" | "9:16", optional)duration_seconds (integer, optional)
referenceImages: 8 secondsperson_generation ("DONT_ALLOW" | "ALLOW_ADULT" | "ALLOW_ALL", optional)wait_for_completion (boolean, default: true)timeout_ms (integer, default: 900000)poll_interval_ms (integer, default: 10000)download_when_done (boolean, optional; defaults to true when waiting)tool_result ("resource_link" | "resource", default: "resource_link")
content[] shape when downloading generated videos.response_format ("url" | "b64_json", default: "url")
structuredContent.response.generatedVideos[].video fields:
"url" prefers video.uri (and strips video.videoBytes)"b64_json" prefers video.videoBytes (and strips video.uri)Requirements:
GEMINI_API_KEY (or GOOGLE_API_KEY), or google.api_key in secrets.yaml.GOOGLE_GENAI_USE_VERTEXAI=true, GOOGLE_CLOUD_PROJECT, and GOOGLE_CLOUD_LOCATION (or google.vertex_ai.* in secrets.yaml).Output:
structuredContent: Google operation object (includes name, done, and response.generatedVideos[] when available).content: status text, optional .mp4 resource_link (default) or embedded resource blocks (when downloaded), plus JSON text blocks for compatibility.Retrieve/poll an existing Google video operation (ai.operations.getVideosOperation).
operation_name (string, required)response_format ("url" | "b64_json", default: "url")Output:
structuredContent: Google operation object.content: JSON text blocks with a short summary + the full operation.Download .mp4 content for a completed operation and return file-first MCP resource_link (default) or embedded resource blocks (via tool_result).
operation_name (string, required)index (integer, default: 0) — selects response.generatedVideos[index]tool_result ("resource_link" | "resource", default: "resource_link")response_format ("url" | "b64_json", default: "url")Recommended workflow:
google-videos-generate with wait_for_completion=true (default) to get the completed operation and downloads; set to false only if you need the operation id immediately.google-videos-retrieve-operation until done=true.google-videos-retrieve-content to download an .mp4 and receive a resource_link (or embedded resource).Fetch and process images from URLs or local file paths with optional compression.
Arguments (input schema):
sources (string[], optional)
MEDIA_GEN_DIRS entry).ids and n.ids (string[], optional)
MEDIA_GEN_DIRS[0] directory.[A-Za-z0-9_-] only; no .., *, ?, slashes)._{id}_ or _{id}. (supports both single outputs and multi-output suffixes like _1.png).ids is used, compression and file are not supported (no new files are created).sources and n.n (integer, optional)
MEDIA_GEN_DIRS[0] directory.sources and ids.compression (object, optional)
max_size (integer, optional): Max dimension in pixels. Images larger than this will be resized.max_bytes (integer, optional): Target max file size in bytes. Default: 819200 (800KB).quality (integer, optional): JPEG/WebP quality 1-100. Default: 85.format ("jpeg" | "png" | "webp", optional): Output format. Default: jpeg.response_format ("url" | "path" | "b64_json", default: "url")
url), local path (path), or inline base64 (b64_json).tool_result ("resource_link" | "image", default: "resource_link")
content[] shape:
"resource_link" emits ResourceLink items (file/URL-based)"image" emits base64 ImageContent blocksfile (string, optional)
Behavior notes:
compression options are provided.n is provided, it is only honored when the MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_IMAGES environment variable is set to true. Otherwise, the call fails with a validation error.media-gen-mcp due to a timeout. In creative environments where you need to quickly retrieve the latest openai-images-generate / openai-images-edit outputs, you can use fetch-images with the n argument. When the MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_IMAGES=true environment variable is set, fetch-images will return the last N files from MEDIA_GEN_DIRS[0] even if the original generation or edit operation timed out on the MCP client side.Fetch videos from HTTP(S) URLs or local file paths.
Arguments (input schema):
sources (string[], optional)
MEDIA_GEN_DIRS entry).ids and n.ids (string[], optional)
MEDIA_GEN_DIRS[0] directory.[A-Za-z0-9_-] only; no .., *, ?, slashes)._{id}_ or _{id}. (supports both single outputs and multi-asset suffixes like _thumbnail.webp).ids is used, file is not supported (no downloads; returns existing files).sources and n.n (integer, optional)
MEDIA_GEN_DIRS[0] directory.sources and ids.tool_result ("resource_link" | "resource", default: "resource_link")
content[] shape:
"resource_link" emits ResourceLink items (file/URL-based)"resource" emits EmbeddedResource blocks with base64 resource.blobfile (string, optional)
Output:
content: one resource_link (default) or embedded resource block per resolved video, plus an optional error summary text block.structuredContent: { data: [{ source, uri, file, mimeType, name, downloaded }], errors?: string[] }.Behavior notes:
MEDIA_GEN_URLS (when set).n is provided, it is only honored when the MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_VIDEOS environment variable is set to true. Otherwise, the call fails with a validation error.Fetch documents from HTTP(S) URLs or local file paths.
Arguments (input schema):
sources (string[])
MEDIA_GEN_DIRS entry).tool_result ("resource_link" | "resource", default: "resource_link")
content[] shape:
"resource_link" emits ResourceLink items (file/URL-based)"resource" emits EmbeddedResource blocks with base64 resource.blobfile (string, optional)
Output:
content: one resource_link (default) or embedded resource block per resolved document, plus an optional error summary text block.structuredContent: { data: [{ source, uri, file, mimeType, name, downloaded }], errors?: string[] }.Behavior notes:
MEDIA_GEN_URLS (when set).MEDIA_GEN_DIRS and can be provided as file:// URLs.output__media-gen__fetch-document_. when file is omitted.Debug tool for testing MCP result placement without calling OpenAI API.
Enabled only when MEDIA_GEN_MCP_TEST_SAMPLE_DIR is set. The tool reads existing images from this directory and does not create new files.
Arguments (input schema):
response_format ("url" | "path" | "b64_json", default: "url")result_placement ("content" | "api" | "structured" | "toplevel" or array of these, optional)
MEDIA_GEN_MCP_RESULT_PLACEMENT for this call.compression (object, optional)
fetch-images, but using camelCase keys:tool_result ("resource_link" | "image", default: "resource_link")
content[] shape:
"resource_link" emits ResourceLink items (file/URL-based)"image" emits base64 ImageContent blocksmaxSize (integer, optional): max dimension in pixels.maxBytes (integer, optional): target max file size in bytes.quality (integer, optional): JPEG/WebP quality 1–100.format ("jpeg" | "png" | "webp", optional): output format.Behavior notes:
Reads up to 10 images from the sample directory (no sorting — filesystem order).
Uses the same result-building logic as openai-images-generate and openai-images-edit (including result_placement overrides).
When output == "base64" and compression is provided, sample files are read and compressed in memory using sharp; original files on disk are never modified.
Useful for testing how different MCP clients handle various result structures.
When result_placement includes "api", the tool returns a mock OpenAI Images API-style object:
created, data[], background, output_format, size, quality.response_format: "b64_json" each data[i] contains b64_json.response_format: "path" each data[i] contains path.response_format: "url" each data[i] contains url instead of b64_json.test-imagesFor local debugging there are two helper scripts that call test-images directly:
npm run test-images – uses debug/debug-call.ts and prints the validated
CallToolResult as seen by the MCP SDK client. Usage:
npm run test-images -- [placement] [--response_format url|path|b64_json]
# examples:
# npm run test-images -- structured --response_format b64_json
# npm run test-images -- structured --response_format path
# npm run test-images -- structured --response_format url
npm run test-images:raw – uses debug/debug-call-raw.ts and prints the raw
JSON-RPC result (the underlying CallToolResult without extra wrapping). Same
CLI flags as above.
Both scripts truncate large fields for readability:
image_url → first 80 characters, then ...(N chars);b64_json and data (when it is a base64 string) → first 25 characters, then ...(N chars).This package follows SemVer: MAJOR.MINOR.PATCH (x.y.z).
MAJOR — breaking changes (tool names, input schemas, output shapes).MINOR — new tools or backward-compatible additions (new optional params, new fields in responses).PATCH — bug fixes and internal refactors with no intentional behavior change.Since 1.0.0, this project follows standard SemVer rules: breaking changes bump MAJOR (npm’s ^1.0.0 allows 1.x, but not 2.0.0).
This repository aims to stay closely aligned with current stable releases:
@modelcontextprotocol/sdk and schema.openai package.^4.1.3). In this project we previously ran on Zod 3.x and, in combination with the MCP TypeScript SDK typings, hit heavy TypeScript errors when passing .shape into inputSchema — in particular TS2589 ("type instantiation is excessively deep and possibly infinite") and TS2322 (schema shape not assignable to AnySchema | ZodRawShapeCompat). We track the upstream discussion in modelcontextprotocol/typescript-sdk#494 and the related Zod typing work in colinhacks/zod#5222, and keep the stack on a combination that passes full strict compilation reliably.tsconfig-strict.json that enables all strict TypeScript checks (strict, noUnusedLocals, noUnusedParameters, exactOptionalPropertyTypes, noUncheckedIndexedAccess, noPropertyAccessFromIndexSignature, etc.).You are welcome to pin or downgrade Node.js, TypeScript, the OpenAI SDK, Zod, or other pieces of the stack if your environment requires it, but please keep in mind:
This project is intentionally a bit futuristic: it tries to keep up with new capabilities as they appear in MCP and OpenAI tooling (in particular, robust multimodal/image support over MCP and in ChatGPT’s UI). A detailed real‑world bug report and analysis of MCP image rendering in ChatGPT is listed in the References section as a case study.
If you need a long-term-stable stack, pin exact versions in your own fork and validate them carefully in your environment.
All tool handlers use strongly typed callback parameters derived from Zod schemas via z.input:
// Schema definition
const openaiImagesGenerateBaseSchema = z.object({
prompt: z.string().max(32000),
background: z.enum(["transparent", "opaque", "auto"]).optional(),
// ... more fields
});
// Type alias
type OpenAIImagesGenerateArgs = z.input;
// Strictly typed callback
server.registerTool(
"openai-images-generate",
{ inputSchema: openaiImagesGenerateBaseSchema.shape, ... },
async (args: OpenAIImagesGenerateArgs, _extra: unknown) => {
const validated = openaiImagesGenerateSchema.parse(args);
// ... handler logic
},
);
This pattern provides:
.parse() ensures all inputs match the schema before processing.inputSchema: schema.shape provides the JSON Schema for tool registration.All tools (openai-images-*, openai-videos-*, fetch-images, fetch-videos, fetch-document, test-images) follow this pattern.
This MCP server exposes the following tools with annotation hints:
| Tool | readOnlyHint | destructiveHint | idempotentHint | openWorldHint |
|---|---|---|---|---|
| openai-images-generate | true | false | false | true |
| openai-images-edit | true | false | false | true |
| openai-videos-create | true | false | false | true |
| openai-videos-remix | true | false | false | true |
| openai-videos-list | true | false | false | true |
| openai-videos-retrieve | true | false | false | true |
| openai-videos-delete | true | false | false | true |
| openai-videos-retrieve-content | true | false | false | true |
| fetch-images | true | false | false | false |
| fetch-videos | true | false | false | false |
| fetch-document | true | false | false | false |
| test-images | true | false | false | false |
These hints help MCP clients understand that these tools:
Because readOnlyHint is set to true for most tools, MCP platforms (including chatgpt.com) can treat this server as logically read-only and usually will not show "this tool can modify your files" warnings.
media-gen-mcp/
├── src/
│ ├── index.ts # MCP server entry point
│ └── lib/
│ ├── compression.ts # Image compression (sharp)
│ ├── env.ts # Env parsing + allowlists (+ glob support)
│ ├── helpers.ts # URL/path validation, result building
│ ├── logger.ts # Structured logging + truncation helpers
│ └── schemas.ts # Zod schemas for all tools
├── test/
│ ├── compression.test.ts # 12 tests
│ ├── env.test.ts # 19 tests
│ ├── fetch-images.integration.test.ts# 2 tests
│ ├── fetch-videos.integration.test.ts# 2 tests
│ ├── helpers.test.ts # 31 tests
│ ├── logger.test.ts # 10 tests
│ └── schemas.test.ts # 64 tests
├── debug/ # Local debug helpers (MCP client scripts)
├── plan/ # Design notes / plans
├── dist/ # Compiled output
├── tsconfig.json
├── vitest.config.ts
├── package.json
├── CHANGELOG.md
├── README.md
└── AGENTS.md
MIT
OPENAI_API_KEY is valid and has image API access./ (e.g., /path/to/image.png): (e.g., C:/path/to/image.png or C:\path\to\image.png)This server was originally inspired by SureScaleAI/openai-gpt-image-mcp, but is now a separate implementation focused on closely tracking the official specifications:
openai-images-generate
and openai-images-edit mirror
images.create / gpt-image-1.5:
prompt, n, size, quality, background, output_format,
output_compression, user, plus response_format (url / b64_json) with
the same semantics as the OpenAI Images API.result_placement = "content", the server follows the MCP 5.2 Tool Result
section
(5.2.2 Image Content,
5.2.4 Resource Links)
and emits strongly-typed content[] items:
{ "type": "image", "data": "", "mimeType": "image/png" } for
response_format = "b64_json";{ "type": "resource_link", "uri": "file:///..." | "https://...", "name": "...", "mimeType": "image/..." }
for file/URL-based output.result_placement = "api", the tool
result itself is an OpenAI Images-style object:
{ created, data: [...], background, output_format, size, quality, usage? },
where each data[] entry contains either b64_json (for
response_format = "b64_json") or url (for response_format = "url"). No
MCP wrapper fields (content, structuredContent, files, urls) are
added in this mode.In short, this library:
result_placement = "api" with response_format = "url" | "b64_json", andimage,
resource_link, text) when result_placement = "content".Default mode / Claude Desktop / strict MCP clients
For clients that strictly follow the MCP spec, the recommended (and natural)
configuration is:
result_placement = contentresponse_format = b64_jsonIn this mode the server returns:
content[] with type: "image" (base64 image data) and
type: "resource_link" (file/URL links), matching MCP section 5.2 (Image
Content and Resource Links). This output works well for direct
integration with Claude Desktop and any client that fully implements the
2025‑11‑25 spec.chatgpt.com Developer Mode
For running this server as an MCP backend behind ChatGPT Developer Mode, the
most practical configuration is the one that most closely matches the OpenAI
Images API:
result_placement = apiresponse_format = urlIn this mode the tool result matches the images.create / gpt-image-1.5
format (including data[].url), which simplifies consumption from backends
and libraries that expect the OpenAI schema.
However, even with this OpenAI-native shape, the chatgpt.com client does not currently render images. This behavior is documented in detail in the following report:
content to stay within typical MCP client limits. You can override this threshold by setting the MCP_MAX_CONTENT_BYTES environment variable to a higher (or lower) value.resource_link instead of inline base64. This helps avoid client-side "payload too large" errors while still delivering full-resolution images.file path, outputs are saved under MEDIA_GEN_DIRS[0] (default: /tmp/media-gen-mcp) using names like output__media-gen___..MEDIA_GEN_DIRS: Set this to control where outputs are saved. Example: export MEDIA_GEN_DIRS=/your/desired/dir. This directory may coincide with your public static directory if you serve files directly from it.MEDIA_GEN_MCP_URL_PREFIXES: Optional comma-separated HTTPS prefixes for public URLs, matched positionally to MEDIA_GEN_DIRS entries. When set, the server builds public URLs as / and returns them alongside file paths (for example via resource_link URIs and structuredContent.data[].url when response_format: "url"). Example: export MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media,https://media-gen.example.com/samplesMEDIA_GEN_DIRS and (optionally) MEDIA_GEN_MCP_URL_PREFIXES to serve images via a public web server (e.g., nginx).If you want ChatGPT (or any MCP client) to mention publicly accessible URLs alongside file paths:
Expose your image directory via HTTPS. For example, on nginx:
server {
# listen 443 ssl http2;
# server_name ;
# ssl_certificate ;
# ssl_certificate_key ;
location /media/ {
alias /home/username/media-gen-mcp/media/;
autoindex off;
expires 7d;
add_header Cache-Control "public, immutable";
}
}
Ensure the first entry in MEDIA_GEN_DIRS points to the same directory (e.g. MEDIA_GEN_DIRS=/home/username/media-gen-mcp/media/ or MEDIA_GEN_DIRS=media/ when running from the project root).
Set MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media so the server returns matching HTTPS URLs in top-level urls, resource_link URIs, and image_url fields (for response_format: "url").
Both openai-images-generate and openai-images-edit now attach files + urls for base64 and file response modes, allowing clients to reference either the local filesystem path or the public HTTPS link. This is particularly useful while ChatGPT cannot yet render MCP image blocks inline.
Model Context Protocol
OpenAI Images
OpenAI Videos
Case studies
image content blocks: empty tool results, raw base64 treated as text (huge token usage), or generic "I can't see the image" responses, while other MCP clients (Cursor, Claude) rendered the same images correctly.CallToolResult image content blocks and media objects (especially around UI rendering and nested containers).image blocks so that tools remain usable across current and future MCP clients.Install via CLI
npx mdskills install strato-space/media-gen-mcpMedia Gen MCP is a free, open-source AI agent skill. Media Gen MCP is a strict TypeScript Model Context Protocol (MCP) server for OpenAI Images (gpt-image-1.5, gpt-image-1), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart resourcelink vs inline image outputs and optional sharp processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Wo
Install Media Gen MCP with a single command:
npx mdskills install strato-space/media-gen-mcpThis downloads the skill files into your project and your AI agent picks them up automatically.
Media Gen MCP works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Gemini Cli, Amp, Roo Code, Goose. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.