Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration
Add this skill
npx mdskills install sickn33/audio-transcriberComprehensive audio-to-text transcription with metadata extraction and LLM-powered summarization
1---2name: audio-transcriber3description: "Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration"4version: 1.2.05author: Eric Andrade6created: 2025-02-017updated: 2026-02-048platforms: [github-copilot-cli, claude-code, codex]9category: content10tags: [audio, transcription, whisper, meeting-minutes, speech-to-text]11risk: safe12---1314## Purpose1516This skill automates audio-to-text transcription with professional Markdown output, extracting rich technical metadata (speakers, timestamps, language, file size, duration) and generating structured meeting minutes and executive summaries. It uses Faster-Whisper or Whisper with zero configuration, working universally across projects without hardcoded paths or API keys.1718Inspired by tools like Plaud, this skill transforms raw audio recordings into actionable documentation, making it ideal for meetings, interviews, lectures, and content analysis.1920## When to Use2122Invoke this skill when:2324- User needs to transcribe audio/video files to text25- User wants meeting minutes automatically generated from recordings26- User requires speaker identification (diarization) in conversations27- User needs subtitles/captions (SRT, VTT formats)28- User wants executive summaries of long audio content29- User asks variations of "transcribe this audio", "convert audio to text", "generate meeting notes from recording"30- User has audio files in common formats (MP3, WAV, M4A, OGG, FLAC, WEBM)3132## Workflow3334### Step 0: Discovery (Auto-detect Transcription Tools)3536**Objective:** Identify available transcription engines without user configuration.3738**Actions:**3940Run detection commands to find installed tools:4142```bash43# Check for Faster-Whisper (preferred - 4-5x faster)44if python3 -c "import faster_whisper" 2>/dev/null; then45 TRANSCRIBER="faster-whisper"46 echo "✅ Faster-Whisper detected (optimized)"47# Fallback to original Whisper48elif python3 -c "import whisper" 2>/dev/null; then49 TRANSCRIBER="whisper"50 echo "✅ OpenAI Whisper detected"51else52 TRANSCRIBER="none"53 echo "⚠️ No transcription tool found"54fi5556# Check for ffmpeg (audio format conversion)57if command -v ffmpeg &>/dev/null; then58 echo "✅ ffmpeg available (format conversion enabled)"59else60 echo "ℹ️ ffmpeg not found (limited format support)"61fi62```6364**If no transcriber found:**6566Offer automatic installation using the provided script:6768```bash69echo "⚠️ No transcription tool found"70echo ""71echo "🔧 Auto-install dependencies? (Recommended)"72read -p "Run installation script? [Y/n]: " AUTO_INSTALL7374if [[ ! "$AUTO_INSTALL" =~ ^[Nn] ]]; then75 # Get skill directory (works for both repo and symlinked installations)76 SKILL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"7778 # Run installation script79 if [[ -f "$SKILL_DIR/scripts/install-requirements.sh" ]]; then80 bash "$SKILL_DIR/scripts/install-requirements.sh"81 else82 echo "❌ Installation script not found"83 echo ""84 echo "📦 Manual installation:"85 echo " pip install faster-whisper # Recommended"86 echo " pip install openai-whisper # Alternative"87 echo " brew install ffmpeg # Optional (macOS)"88 exit 189 fi9091 # Verify installation succeeded92 if python3 -c "import faster_whisper" 2>/dev/null || python3 -c "import whisper" 2>/dev/null; then93 echo "✅ Installation successful! Proceeding with transcription..."94 else95 echo "❌ Installation failed. Please install manually."96 exit 197 fi98else99 echo ""100 echo "📦 Manual installation required:"101 echo ""102 echo "Recommended (fastest):"103 echo " pip install faster-whisper"104 echo ""105 echo "Alternative (original):"106 echo " pip install openai-whisper"107 echo ""108 echo "Optional (format conversion):"109 echo " brew install ffmpeg # macOS"110 echo " apt install ffmpeg # Linux"111 echo ""112 exit 1113fi114```115116This ensures users can install dependencies with one confirmation, or opt for manual installation if preferred.117118**If transcriber found:**119120Proceed to Step 0b (CLI Detection).121122123### Step 1: Validate Audio File124125**Objective:** Verify file exists, check format, and extract metadata.126127**Actions:**1281291. **Accept file path or URL** from user:130 - Local file: `meeting.mp3`131 - URL: `https://example.com/audio.mp3` (download to temp directory)1321332. **Verify file exists:**134135```bash136if [[ ! -f "$AUDIO_FILE" ]]; then137 echo "❌ File not found: $AUDIO_FILE"138 exit 1139fi140```1411423. **Extract metadata** using ffprobe or file utilities:143144```bash145# Get file size146FILE_SIZE=$(du -h "$AUDIO_FILE" | cut -f1)147148# Get duration and format using ffprobe149DURATION=$(ffprobe -v error -show_entries format=duration \150 -of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)151FORMAT=$(ffprobe -v error -select_streams a:0 -show_entries \152 stream=codec_name -of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)153154# Convert duration to HH:MM:SS155DURATION_HMS=$(date -u -r "$DURATION" +%H:%M:%S 2>/dev/null || echo "Unknown")156```1571584. **Check file size** (warn if large for cloud APIs):159160```bash161SIZE_MB=$(du -m "$AUDIO_FILE" | cut -f1)162if [[ $SIZE_MB -gt 25 ]]; then163 echo "⚠️ Large file ($FILE_SIZE) - processing may take several minutes"164fi165```1661675. **Validate format** (supported: MP3, WAV, M4A, OGG, FLAC, WEBM):168169```bash170EXTENSION="${AUDIO_FILE##*.}"171SUPPORTED_FORMATS=("mp3" "wav" "m4a" "ogg" "flac" "webm" "mp4")172173if [[ ! " ${SUPPORTED_FORMATS[@]} " =~ " ${EXTENSION,,} " ]]; then174 echo "⚠️ Unsupported format: $EXTENSION"175 if command -v ffmpeg &>/dev/null; then176 echo "🔄 Converting to WAV..."177 ffmpeg -i "$AUDIO_FILE" -ar 16000 "${AUDIO_FILE%.*}.wav" -y178 AUDIO_FILE="${AUDIO_FILE%.*}.wav"179 else180 echo "❌ Install ffmpeg to convert formats: brew install ffmpeg"181 exit 1182 fi183fi184```185186187### Step 3: Generate Markdown Output188189**Objective:** Create structured Markdown with metadata, transcription, meeting minutes, and summary.190191**Output Template:**192193```markdown194# Audio Transcription Report195196## 📊 Metadata197198| Field | Value |199|-------|-------|200| **File Name** | {filename} |201| **File Size** | {file_size} |202| **Duration** | {duration_hms} |203| **Language** | {language} ({language_code}) |204| **Processed Date** | {process_date} |205| **Speakers Identified** | {num_speakers} |206| **Transcription Engine** | {engine} (model: {model}) |207208209## 📋 Meeting Minutes210211### Participants212- {speaker_1}213- {speaker_2}214- ...215216### Topics Discussed2171. **{topic_1}** ({timestamp})218 - {key_point_1}219 - {key_point_2}2202212. **{topic_2}** ({timestamp})222 - {key_point_1}223224### Decisions Made225- ✅ {decision_1}226- ✅ {decision_2}227228### Action Items229- [ ] **{action_1}** - Assigned to: {speaker} - Due: {date_if_mentioned}230- [ ] **{action_2}** - Assigned to: {speaker}231232233*Generated by audio-transcriber skill v1.0.0*234*Transcription engine: {engine} | Processing time: {elapsed_time}s*235```236237**Implementation:**238239Use Python or bash with AI model (Claude/GPT) for intelligent summarization:240241```python242def generate_meeting_minutes(segments):243 """Extract topics, decisions, action items from transcription."""244245 # Group segments by topic (simple clustering by timestamps)246 topics = cluster_by_topic(segments)247248 # Identify action items (keywords: "should", "will", "need to", "action")249 action_items = extract_action_items(segments)250251 # Identify decisions (keywords: "decided", "agreed", "approved")252 decisions = extract_decisions(segments)253254 return {255 "topics": topics,256 "decisions": decisions,257 "action_items": action_items258 }259260def generate_summary(segments, max_paragraphs=5):261 """Create executive summary using AI (Claude/GPT via API or local model)."""262263 full_text = " ".join([s["text"] for s in segments])264265 # Use Chain of Density approach (from prompt-engineer frameworks)266 summary_prompt = f"""267 Summarize the following transcription in {max_paragraphs} concise paragraphs.268 Focus on key topics, decisions, and action items.269270 Transcription:271 {full_text}272 """273274 # Call AI model (placeholder - user can integrate Claude API or use local model)275 summary = call_ai_model(summary_prompt)276277 return summary278```279280**Output file naming:**281282```bash283# v1.1.0: Use timestamp para evitar sobrescrever284TIMESTAMP=$(date +%Y%m%d-%H%M%S)285TRANSCRIPT_FILE="transcript-${TIMESTAMP}.md"286ATA_FILE="ata-${TIMESTAMP}.md"287288echo "$TRANSCRIPT_CONTENT" > "$TRANSCRIPT_FILE"289echo "✅ Transcript salvo: $TRANSCRIPT_FILE"290291if [[ -n "$ATA_CONTENT" ]]; then292 echo "$ATA_CONTENT" > "$ATA_FILE"293 echo "✅ Ata salva: $ATA_FILE"294fi295```296297298#### **SCENARIO A: User Provided Custom Prompt**299300**Workflow:**3013021. **Display user's prompt:**303 ```304 📝 Prompt fornecido pelo usuário:305 ┌──────────────────────────────────┐306 │ [User's prompt preview] │307 └──────────────────────────────────┘308 ```3093102. **Automatically improve with prompt-engineer (if available):**311 ```bash312 🔧 Melhorando prompt com prompt-engineer...313 [Invokes: gh copilot -p "melhore este prompt: {user_prompt}"]314 ```3153163. **Show both versions:**317 ```318 ✨ Versão melhorada:319 ┌──────────────────────────────────┐320 │ Role: Você é um documentador... │321 │ Instructions: Transforme... │322 │ Steps: 1) ... 2) ... │323 │ End Goal: ... │324 └──────────────────────────────────┘325326 📝 Versão original:327 ┌──────────────────────────────────┐328 │ [User's original prompt] │329 └──────────────────────────────────┘330 ```3313324. **Ask which to use:**333 ```bash334 💡 Usar versão melhorada? [s/n] (default: s):335 ```3363375. **Process with selected prompt:**338 - If "s": use improved339 - If "n": use original340341342#### **LLM Processing (Both Scenarios)**343344Once prompt is finalized:345346```python347from rich.progress import Progress, SpinnerColumn, TextColumn348349def process_with_llm(transcript, prompt, cli_tool='claude'):350 full_prompt = f"{prompt}\n\n---\n\nTranscrição:\n\n{transcript}"351352 with Progress(353 SpinnerColumn(),354 TextColumn("[progress.description]{task.description}"),355 transient=True356 ) as progress:357 progress.add_task(358 description=f"🤖 Processando com {cli_tool}...",359 total=None360 )361362 if cli_tool == 'claude':363 result = subprocess.run(364 ['claude', '-'],365 input=full_prompt,366 capture_output=True,367 text=True,368 timeout=300 # 5 minutes369 )370 elif cli_tool == 'gh-copilot':371 result = subprocess.run(372 ['gh', 'copilot', 'suggest', '-t', 'shell', full_prompt],373 capture_output=True,374 text=True,375 timeout=300376 )377378 if result.returncode == 0:379 return result.stdout.strip()380 else:381 return None382```383384**Progress output:**385```386🤖 Processando com claude... ⠋387[After completion:]388✅ Ata gerada com sucesso!389```390391392#### **Final Output**393394**Success (both files):**395```bash396💾 Salvando arquivos...397398✅ Arquivos criados:399 - transcript-20260203-023045.md (transcript puro)400 - ata-20260203-023045.md (processado com LLM)401402🧹 Removidos arquivos temporários: metadata.json, transcription.json403404✅ Concluído! Tempo total: 3m 45s405```406407**Transcript only (user declined LLM):**408```bash409💾 Salvando arquivos...410411✅ Arquivo criado:412 - transcript-20260203-023045.md413414ℹ️ Ata não gerada (processamento LLM recusado pelo usuário)415416🧹 Removidos arquivos temporários: metadata.json, transcription.json417418✅ Concluído!419```420421422### Step 5: Display Results Summary423424**Objective:** Show completion status and next steps.425426**Output:**427428```bash429echo ""430echo "✅ Transcription Complete!"431echo ""432echo "📊 Results:"433echo " File: $OUTPUT_FILE"434echo " Language: $LANGUAGE"435echo " Duration: $DURATION_HMS"436echo " Speakers: $NUM_SPEAKERS"437echo " Words: $WORD_COUNT"438echo " Processing time: ${ELAPSED_TIME}s"439echo ""440echo "📝 Generated:"441echo " - $OUTPUT_FILE (Markdown report)"442[if alternative formats:]443echo " - ${OUTPUT_FILE%.*}.srt (Subtitles)"444echo " - ${OUTPUT_FILE%.*}.json (Structured data)"445echo ""446echo "🎯 Next steps:"447echo " 1. Review meeting minutes and action items"448echo " 2. Share report with participants"449echo " 3. Track action items to completion"450```451452453## Example Usage454455### **Example 1: Basic Transcription**456457**User Input:**458```bash459copilot> transcribe audio to markdown: meeting-2026-02-02.mp3460```461462**Skill Output:**463464```bash465✅ Faster-Whisper detected (optimized)466✅ ffmpeg available (format conversion enabled)467468📂 File: meeting-2026-02-02.mp3469📊 Size: 12.3 MB470⏱️ Duration: 00:45:32471472🎙️ Processing...473[████████████████████] 100%474475✅ Language detected: Portuguese (pt-BR)476👥 Speakers identified: 4477📝 Generating Markdown output...478479✅ Transcription Complete!480481📊 Results:482 File: meeting-2026-02-02.md483 Language: pt-BR484 Duration: 00:45:32485 Speakers: 4486 Words: 6,842487 Processing time: 127s488489📝 Generated:490 - meeting-2026-02-02.md (Markdown report)491492🎯 Next steps:493 1. Review meeting minutes and action items494 2. Share report with participants495 3. Track action items to completion496```497498499### **Example 3: Batch Processing**500501**User Input:**502```bash503copilot> transcreva estes áudios: recordings/*.mp3504```505506**Skill Output:**507508```bash509📦 Batch mode: 5 files found510 1. team-standup.mp3511 2. client-call.mp3512 3. brainstorm-session.mp3513 4. product-demo.mp3514 5. retrospective.mp3515516🎙️ Processing batch...517518[1/5] team-standup.mp3 ✅ (2m 34s)519[2/5] client-call.mp3 ✅ (15m 12s)520[3/5] brainstorm-session.mp3 ✅ (8m 47s)521[4/5] product-demo.mp3 ✅ (22m 03s)522[5/5] retrospective.mp3 ✅ (11m 28s)523524✅ Batch Complete!525📝 Generated 5 Markdown reports526⏱️ Total processing time: 6m 15s527```528529530### **Example 5: Large File Warning**531532**User Input:**533```bash534copilot> transcribe audio to markdown: conference-keynote.mp3535```536537**Skill Output:**538539```bash540✅ Faster-Whisper detected (optimized)541542📂 File: conference-keynote.mp3543📊 Size: 87.2 MB544⏱️ Duration: 02:15:47545⚠️ Large file (87.2 MB) - processing may take several minutes546547Continue? [Y/n]:548```549550**User:** `Y`551552```bash553🎙️ Processing... (this may take 10-15 minutes)554[████░░░░░░░░░░░░░░░░] 20% - Estimated time remaining: 12m555```556557558This skill is **platform-agnostic** and works in any terminal context where GitHub Copilot CLI is available. It does not depend on specific project configurations or external APIs, following the zero-configuration philosophy.559
Full transparency — inspect the skill content before installing.