Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
Add this skill
npx mdskills install openai/transcribeClear workflow with sensible defaults, diarization support, and good CLI examples
1---2name: "transcribe"3description: "Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings."4---567# Audio Transcribe89Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.1011## Workflow121. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.132. Verify `OPENAI_API_KEY` is set. If missing, ask the user to set it locally (do not ask them to paste the key).143. Run the bundled `transcribe_diarize.py` CLI with sensible defaults (fast text transcription).154. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.165. Save outputs under `output/transcribe/` when working in this repo.1718## Decision rules19- Default to `gpt-4o-mini-transcribe` with `--response-format text` for fast transcription.20- If the user wants speaker labels or diarization, use `--model gpt-4o-transcribe-diarize --response-format diarized_json`.21- If audio is longer than ~30 seconds, keep `--chunking-strategy auto`.22- Prompting is not supported for `gpt-4o-transcribe-diarize`.2324## Output conventions25- Use `output/transcribe/<job-id>/` for evaluation runs.26- Use `--out-dir` for multiple files to avoid overwriting.2728## Dependencies (install if missing)29Prefer `uv` for dependency management.3031```32uv pip install openai33```34If `uv` is unavailable:35```36python3 -m pip install openai37```3839## Environment40- `OPENAI_API_KEY` must be set for live API calls.41- If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.42- Never ask the user to paste the full key in chat.4344## Skill path (set once)4546```bash47export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"48export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"49```5051User-scoped skills install under `$CODEX_HOME/skills` (default: `~/.codex/skills`).5253## CLI quick start54Single file (fast text default):55```56python3 "$TRANSCRIBE_CLI" \57 path/to/audio.wav \58 --out transcript.txt59```6061Diarization with known speakers (up to 4):62```63python3 "$TRANSCRIBE_CLI" \64 meeting.m4a \65 --model gpt-4o-transcribe-diarize \66 --known-speaker "Alice=refs/alice.wav" \67 --known-speaker "Bob=refs/bob.wav" \68 --response-format diarized_json \69 --out-dir output/transcribe/meeting70```7172Plain text output (explicit):73```74python3 "$TRANSCRIBE_CLI" \75 interview.mp3 \76 --response-format text \77 --out interview.txt78```7980## Reference map81- `references/api.md`: supported formats, limits, response formats, and known-speaker notes.82
Full transparency — inspect the skill content before installing.