How do I install Audio Transcriber?

Install Audio Transcriber with a single command: npx mdskills install sickn33/audio-transcriber. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Audio Transcriber?

Audio Transcriber works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

Audio Transcriber

Name: Audio Transcriber: AI Agent Skill
Brand: sickn33
Availability: InStock
Rating: 7 (1 reviews)
Author: sickn33

Verified

Video & PodcastIntermediate

Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration

by @sickn3342 downloads13,166Updated 2/20/2026

Add this skill

npx mdskills install sickn33/audio-transcriber

Fork & Edit

Are you @sickn33? Sign in with GitHub to claim this listing.

Skill Advisor7.0

Comprehensive audio-to-text transcription with metadata extraction and LLM-powered summarization

+Provides detailed step-by-step workflow with code examples for each stage
+Auto-detects transcription tools with guided installation fallback
+Integrates LLM processing with prompt engineering and user confirmation
-Contains incomplete sections (Step 5 cuts off mid-sentence, Step 2 references missing Step 0b)
-Over-scoped network permission when local-only Whisper is emphasized as privacy feature

SKILL.md

Edit in Browser

1---
2name: audio-transcriber
3description: "Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration"
4version: 1.2.0
5author: Eric Andrade
6created: 2025-02-01
7updated: 2026-02-04
8platforms: [github-copilot-cli, claude-code, codex]
9category: content
10tags: [audio, transcription, whisper, meeting-minutes, speech-to-text]
11risk: safe
12---
13 
14## Purpose
15 
16This skill automates audio-to-text transcription with professional Markdown output, extracting rich technical metadata (speakers, timestamps, language, file size, duration) and generating structured meeting minutes and executive summaries. It uses Faster-Whisper or Whisper with zero configuration, working universally across projects without hardcoded paths or API keys.
17 
18Inspired by tools like Plaud, this skill transforms raw audio recordings into actionable documentation, making it ideal for meetings, interviews, lectures, and content analysis.
19 
20## When to Use
21 
22Invoke this skill when:
23 
24- User needs to transcribe audio/video files to text
25- User wants meeting minutes automatically generated from recordings
26- User requires speaker identification (diarization) in conversations
27- User needs subtitles/captions (SRT, VTT formats)
28- User wants executive summaries of long audio content
29- User asks variations of "transcribe this audio", "convert audio to text", "generate meeting notes from recording"
30- User has audio files in common formats (MP3, WAV, M4A, OGG, FLAC, WEBM)
31 
32## Workflow
33 
34### Step 0: Discovery (Auto-detect Transcription Tools)
35 
36**Objective:** Identify available transcription engines without user configuration.
37 
38**Actions:**
39 
40Run detection commands to find installed tools:
41 
42```bash
43# Check for Faster-Whisper (preferred - 4-5x faster)
44if python3 -c "import faster_whisper" 2>/dev/null; then
45    TRANSCRIBER="faster-whisper"
46    echo "✅ Faster-Whisper detected (optimized)"
47# Fallback to original Whisper
48elif python3 -c "import whisper" 2>/dev/null; then
49    TRANSCRIBER="whisper"
50    echo "✅ OpenAI Whisper detected"
51else
52    TRANSCRIBER="none"
53    echo "⚠️  No transcription tool found"
54fi
55 
56# Check for ffmpeg (audio format conversion)
57if command -v ffmpeg &>/dev/null; then
58    echo "✅ ffmpeg available (format conversion enabled)"
59else
60    echo "ℹ️  ffmpeg not found (limited format support)"
61fi
62```
63 
64**If no transcriber found:**
65 
66Offer automatic installation using the provided script:
67 
68```bash
69echo "⚠️  No transcription tool found"
70echo ""
71echo "🔧 Auto-install dependencies? (Recommended)"
72read -p "Run installation script? [Y/n]: " AUTO_INSTALL
73 
74if [[ ! "$AUTO_INSTALL" =~ ^[Nn] ]]; then
75    # Get skill directory (works for both repo and symlinked installations)
76    SKILL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
77    
78    # Run installation script
79    if [[ -f "$SKILL_DIR/scripts/install-requirements.sh" ]]; then
80        bash "$SKILL_DIR/scripts/install-requirements.sh"
81    else
82        echo "❌ Installation script not found"
83        echo ""
84        echo "📦 Manual installation:"
85        echo "  pip install faster-whisper  # Recommended"
86        echo "  pip install openai-whisper  # Alternative"
87        echo "  brew install ffmpeg         # Optional (macOS)"
88        exit 1
89    fi
90    
91    # Verify installation succeeded
92    if python3 -c "import faster_whisper" 2>/dev/null || python3 -c "import whisper" 2>/dev/null; then
93        echo "✅ Installation successful! Proceeding with transcription..."
94    else
95        echo "❌ Installation failed. Please install manually."
96        exit 1
97    fi
98else
99    echo ""
100    echo "📦 Manual installation required:"
101    echo ""
102    echo "Recommended (fastest):"
103    echo "  pip install faster-whisper"
104    echo ""
105    echo "Alternative (original):"
106    echo "  pip install openai-whisper"
107    echo ""
108    echo "Optional (format conversion):"
109    echo "  brew install ffmpeg  # macOS"
110    echo "  apt install ffmpeg   # Linux"
111    echo ""
112    exit 1
113fi
114```
115 
116This ensures users can install dependencies with one confirmation, or opt for manual installation if preferred.
117 
118**If transcriber found:**
119 
120Proceed to Step 0b (CLI Detection).
121 
122 
123### Step 1: Validate Audio File
124 
125**Objective:** Verify file exists, check format, and extract metadata.
126 
127**Actions:**
128 
1291. **Accept file path or URL** from user:
130   - Local file: `meeting.mp3`
131   - URL: `https://example.com/audio.mp3` (download to temp directory)
132 
1332. **Verify file exists:**
134 
135```bash
136if [[ ! -f "$AUDIO_FILE" ]]; then
137    echo "❌ File not found: $AUDIO_FILE"
138    exit 1
139fi
140```
141 
1423. **Extract metadata** using ffprobe or file utilities:
143 
144```bash
145# Get file size
146FILE_SIZE=$(du -h "$AUDIO_FILE" | cut -f1)
147 
148# Get duration and format using ffprobe
149DURATION=$(ffprobe -v error -show_entries format=duration \
150    -of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)
151FORMAT=$(ffprobe -v error -select_streams a:0 -show_entries \
152    stream=codec_name -of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)
153 
154# Convert duration to HH:MM:SS
155DURATION_HMS=$(date -u -r "$DURATION" +%H:%M:%S 2>/dev/null || echo "Unknown")
156```
157 
1584. **Check file size** (warn if large for cloud APIs):
159 
160```bash
161SIZE_MB=$(du -m "$AUDIO_FILE" | cut -f1)
162if [[ $SIZE_MB -gt 25 ]]; then
163    echo "⚠️  Large file ($FILE_SIZE) - processing may take several minutes"
164fi
165```
166 
1675. **Validate format** (supported: MP3, WAV, M4A, OGG, FLAC, WEBM):
168 
169```bash
170EXTENSION="${AUDIO_FILE##*.}"
171SUPPORTED_FORMATS=("mp3" "wav" "m4a" "ogg" "flac" "webm" "mp4")
172 
173if [[ ! " ${SUPPORTED_FORMATS[@]} " =~ " ${EXTENSION,,} " ]]; then
174    echo "⚠️  Unsupported format: $EXTENSION"
175    if command -v ffmpeg &>/dev/null; then
176        echo "🔄 Converting to WAV..."
177        ffmpeg -i "$AUDIO_FILE" -ar 16000 "${AUDIO_FILE%.*}.wav" -y
178        AUDIO_FILE="${AUDIO_FILE%.*}.wav"
179    else
180        echo "❌ Install ffmpeg to convert formats: brew install ffmpeg"
181        exit 1
182    fi
183fi
184```
185 
186 
187### Step 3: Generate Markdown Output
188 
189**Objective:** Create structured Markdown with metadata, transcription, meeting minutes, and summary.
190 
191**Output Template:**
192 
193```markdown
194# Audio Transcription Report
195 
196## 📊 Metadata
197 
198| Field | Value |
199|-------|-------|
200| **File Name** | {filename} |
201| **File Size** | {file_size} |
202| **Duration** | {duration_hms} |
203| **Language** | {language} ({language_code}) |
204| **Processed Date** | {process_date} |
205| **Speakers Identified** | {num_speakers} |
206| **Transcription Engine** | {engine} (model: {model}) |
207 
208 
209## 📋 Meeting Minutes
210 
211### Participants
212- {speaker_1}
213- {speaker_2}
214- ...
215 
216### Topics Discussed
2171. **{topic_1}** ({timestamp})
218   - {key_point_1}
219   - {key_point_2}
220 
2212. **{topic_2}** ({timestamp})
222   - {key_point_1}
223 
224### Decisions Made
225- ✅ {decision_1}
226- ✅ {decision_2}
227 
228### Action Items
229- [ ] **{action_1}** - Assigned to: {speaker} - Due: {date_if_mentioned}
230- [ ] **{action_2}** - Assigned to: {speaker}
231 
232 
233*Generated by audio-transcriber skill v1.0.0*  
234*Transcription engine: {engine} | Processing time: {elapsed_time}s*
235```
236 
237**Implementation:**
238 
239Use Python or bash with AI model (Claude/GPT) for intelligent summarization:
240 
241```python
242def generate_meeting_minutes(segments):
243    """Extract topics, decisions, action items from transcription."""
244    
245    # Group segments by topic (simple clustering by timestamps)
246    topics = cluster_by_topic(segments)
247    
248    # Identify action items (keywords: "should", "will", "need to", "action")
249    action_items = extract_action_items(segments)
250    
251    # Identify decisions (keywords: "decided", "agreed", "approved")
252    decisions = extract_decisions(segments)
253    
254    return {
255        "topics": topics,
256        "decisions": decisions,
257        "action_items": action_items
258    }
259 
260def generate_summary(segments, max_paragraphs=5):
261    """Create executive summary using AI (Claude/GPT via API or local model)."""
262    
263    full_text = " ".join([s["text"] for s in segments])
264    
265    # Use Chain of Density approach (from prompt-engineer frameworks)
266    summary_prompt = f"""
267    Summarize the following transcription in {max_paragraphs} concise paragraphs.
268    Focus on key topics, decisions, and action items.
269    
270    Transcription:
271    {full_text}
272    """
273    
274    # Call AI model (placeholder - user can integrate Claude API or use local model)
275    summary = call_ai_model(summary_prompt)
276    
277    return summary
278```
279 
280**Output file naming:**
281 
282```bash
283# v1.1.0: Use timestamp para evitar sobrescrever
284TIMESTAMP=$(date +%Y%m%d-%H%M%S)
285TRANSCRIPT_FILE="transcript-${TIMESTAMP}.md"
286ATA_FILE="ata-${TIMESTAMP}.md"
287 
288echo "$TRANSCRIPT_CONTENT" > "$TRANSCRIPT_FILE"
289echo "✅ Transcript salvo: $TRANSCRIPT_FILE"
290 
291if [[ -n "$ATA_CONTENT" ]]; then
292    echo "$ATA_CONTENT" > "$ATA_FILE"
293    echo "✅ Ata salva: $ATA_FILE"
294fi
295```
296 
297 
298#### **SCENARIO A: User Provided Custom Prompt**
299 
300**Workflow:**
301 
3021. **Display user's prompt:**
303   ```
304   📝 Prompt fornecido pelo usuário:
305   ┌──────────────────────────────────┐
306   │ [User's prompt preview]          │
307   └──────────────────────────────────┘
308   ```
309 
3102. **Automatically improve with prompt-engineer (if available):**
311   ```bash
312   🔧 Melhorando prompt com prompt-engineer...
313   [Invokes: gh copilot -p "melhore este prompt: {user_prompt}"]
314   ```
315 
3163. **Show both versions:**
317   ```
318   ✨ Versão melhorada:
319   ┌──────────────────────────────────┐
320   │ Role: Você é um documentador...  │
321   │ Instructions: Transforme...      │
322   │ Steps: 1) ... 2) ...             │
323   │ End Goal: ...                    │
324   └──────────────────────────────────┘
325 
326   📝 Versão original:
327   ┌──────────────────────────────────┐
328   │ [User's original prompt]         │
329   └──────────────────────────────────┘
330   ```
331 
3324. **Ask which to use:**
333   ```bash
334   💡 Usar versão melhorada? [s/n] (default: s):
335   ```
336 
3375. **Process with selected prompt:**
338   - If "s": use improved
339   - If "n": use original
340 
341 
342#### **LLM Processing (Both Scenarios)**
343 
344Once prompt is finalized:
345 
346```python
347from rich.progress import Progress, SpinnerColumn, TextColumn
348 
349def process_with_llm(transcript, prompt, cli_tool='claude'):
350    full_prompt = f"{prompt}\n\n---\n\nTranscrição:\n\n{transcript}"
351    
352    with Progress(
353        SpinnerColumn(),
354        TextColumn("[progress.description]{task.description}"),
355        transient=True
356    ) as progress:
357        progress.add_task(
358            description=f"🤖 Processando com {cli_tool}...",
359            total=None
360        )
361        
362        if cli_tool == 'claude':
363            result = subprocess.run(
364                ['claude', '-'],
365                input=full_prompt,
366                capture_output=True,
367                text=True,
368                timeout=300  # 5 minutes
369            )
370        elif cli_tool == 'gh-copilot':
371            result = subprocess.run(
372                ['gh', 'copilot', 'suggest', '-t', 'shell', full_prompt],
373                capture_output=True,
374                text=True,
375                timeout=300
376            )
377    
378    if result.returncode == 0:
379        return result.stdout.strip()
380    else:
381        return None
382```
383 
384**Progress output:**
385```
386🤖 Processando com claude... ⠋
387[After completion:]
388✅ Ata gerada com sucesso!
389```
390 
391 
392#### **Final Output**
393 
394**Success (both files):**
395```bash
396💾 Salvando arquivos...
397 
398✅ Arquivos criados:
399  - transcript-20260203-023045.md  (transcript puro)
400  - ata-20260203-023045.md         (processado com LLM)
401 
402🧹 Removidos arquivos temporários: metadata.json, transcription.json
403 
404✅ Concluído! Tempo total: 3m 45s
405```
406 
407**Transcript only (user declined LLM):**
408```bash
409💾 Salvando arquivos...
410 
411✅ Arquivo criado:
412  - transcript-20260203-023045.md
413 
414ℹ️  Ata não gerada (processamento LLM recusado pelo usuário)
415 
416🧹 Removidos arquivos temporários: metadata.json, transcription.json
417 
418✅ Concluído!
419```
420 
421 
422### Step 5: Display Results Summary
423 
424**Objective:** Show completion status and next steps.
425 
426**Output:**
427 
428```bash
429echo ""
430echo "✅ Transcription Complete!"
431echo ""
432echo "📊 Results:"
433echo "  File: $OUTPUT_FILE"
434echo "  Language: $LANGUAGE"
435echo "  Duration: $DURATION_HMS"
436echo "  Speakers: $NUM_SPEAKERS"
437echo "  Words: $WORD_COUNT"
438echo "  Processing time: ${ELAPSED_TIME}s"
439echo ""
440echo "📝 Generated:"
441echo "  - $OUTPUT_FILE (Markdown report)"
442[if alternative formats:]
443echo "  - ${OUTPUT_FILE%.*}.srt (Subtitles)"
444echo "  - ${OUTPUT_FILE%.*}.json (Structured data)"
445echo ""
446echo "🎯 Next steps:"
447echo "  1. Review meeting minutes and action items"
448echo "  2. Share report with participants"
449echo "  3. Track action items to completion"
450```
451 
452 
453## Example Usage
454 
455### **Example 1: Basic Transcription**
456 
457**User Input:**
458```bash
459copilot> transcribe audio to markdown: meeting-2026-02-02.mp3
460```
461 
462**Skill Output:**
463 
464```bash
465✅ Faster-Whisper detected (optimized)
466✅ ffmpeg available (format conversion enabled)
467 
468📂 File: meeting-2026-02-02.mp3
469📊 Size: 12.3 MB
470⏱️  Duration: 00:45:32
471 
472🎙️  Processing...
473[████████████████████] 100%
474 
475✅ Language detected: Portuguese (pt-BR)
476👥 Speakers identified: 4
477📝 Generating Markdown output...
478 
479✅ Transcription Complete!
480 
481📊 Results:
482  File: meeting-2026-02-02.md
483  Language: pt-BR
484  Duration: 00:45:32
485  Speakers: 4
486  Words: 6,842
487  Processing time: 127s
488 
489📝 Generated:
490  - meeting-2026-02-02.md (Markdown report)
491 
492🎯 Next steps:
493  1. Review meeting minutes and action items
494  2. Share report with participants
495  3. Track action items to completion
496```
497 
498 
499### **Example 3: Batch Processing**
500 
501**User Input:**
502```bash
503copilot> transcreva estes áudios: recordings/*.mp3
504```
505 
506**Skill Output:**
507 
508```bash
509📦 Batch mode: 5 files found
510  1. team-standup.mp3
511  2. client-call.mp3
512  3. brainstorm-session.mp3
513  4. product-demo.mp3
514  5. retrospective.mp3
515 
516🎙️  Processing batch...
517 
518[1/5] team-standup.mp3 ✅ (2m 34s)
519[2/5] client-call.mp3 ✅ (15m 12s)
520[3/5] brainstorm-session.mp3 ✅ (8m 47s)
521[4/5] product-demo.mp3 ✅ (22m 03s)
522[5/5] retrospective.mp3 ✅ (11m 28s)
523 
524✅ Batch Complete!
525📝 Generated 5 Markdown reports
526⏱️  Total processing time: 6m 15s
527```
528 
529 
530### **Example 5: Large File Warning**
531 
532**User Input:**
533```bash
534copilot> transcribe audio to markdown: conference-keynote.mp3
535```
536 
537**Skill Output:**
538 
539```bash
540✅ Faster-Whisper detected (optimized)
541 
542📂 File: conference-keynote.mp3
543📊 Size: 87.2 MB
544⏱️  Duration: 02:15:47
545⚠️  Large file (87.2 MB) - processing may take several minutes
546 
547Continue? [Y/n]:
548```
549 
550**User:** `Y`
551 
552```bash
553🎙️  Processing... (this may take 10-15 minutes)
554[████░░░░░░░░░░░░░░░░] 20% - Estimated time remaining: 12m
555```
556 
557 
558This skill is **platform-agnostic** and works in any terminal context where GitHub Copilot CLI is available. It does not depend on specific project configurations or external APIs, following the zero-configuration philosophy.
559

Full transparency — inspect the skill content before installing.

New to skill.md files?

See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.

Read the guide →