audio-transcribe

Transcribes audio to text with timestamps and optional speaker identification. Use when you need to convert speech to text, create subtitles, transcribe meetings, or process voice recordings.

1estrellas

0forks

Actualizado 1/21/2026

Obtener Código Fuente

SKILL.md

readonlyread-only

name

audio-transcribe

description

Transcribes audio to text with timestamps and optional speaker identification. Use when you need to convert speech to text, create subtitles, transcribe meetings, or process voice recordings.

Audio Transcribe

Transcribes audio files to text with timestamps. Supports automatic language detection, speaker identification (diarization), and outputs structured JSON with segment-level timing.

Command

agent-media audio transcribe --in <path> [options]

Inputs

Option	Required	Description
`--in`	Yes	Input audio file path or URL (supports mp3, wav, m4a, ogg)
`--diarize`	No	Enable speaker identification
`--language`	No	Language code (auto-detected if not provided)
`--speakers`	No	Number of speakers hint for diarization
`--out`	No	Output path, filename or directory (default: ./)
`--provider`	No	Provider to use (local, fal, replicate)

Output

Returns a JSON object with transcription data:

{
  "ok": true,
  "media_type": "audio",
  "action": "transcribe",
  "provider": "fal",
  "output_path": "transcription_123_abc.json",
  "transcription": {
    "text": "Full transcription text...",
    "language": "en",
    "segments": [
      { "start": 0.0, "end": 2.5, "text": "Hello.", "speaker": "SPEAKER_0" },
      { "start": 2.5, "end": 5.0, "text": "Hi there.", "speaker": "SPEAKER_1" }
    ]
  }
}

Examples

Basic transcription (auto-detect language):

agent-media audio transcribe --in interview.mp3

Transcription with speaker identification:

agent-media audio transcribe --in meeting.wav --diarize

Transcription with specific language and speaker count:

agent-media audio transcribe --in podcast.mp3 --diarize --language en --speakers 3

Use specific provider:

agent-media audio transcribe --in audio.wav --provider replicate

Extracting Audio from Video

To transcribe a video file, first extract the audio:

# Step 1: Extract audio from video
agent-media audio extract --in video.mp4 --format mp3

# Step 2: Transcribe the extracted audio
agent-media audio transcribe --in extracted_xxx.mp3

Providers

local

Runs locally on CPU using Transformers.js, no API key required.

Uses Moonshine model (5x faster than Whisper)
Models downloaded on first use (~100MB)
Does NOT support diarization — use fal or replicate for speaker identification
You may see a mutex lock failed error — ignore it, the output is correct if "ok": true

agent-media audio transcribe --in audio.mp3 --provider local

fal

Requires FAL_API_KEY
Uses wizper model for fast transcription (2x faster) when diarization is disabled
Uses whisper model when diarization is enabled (native support)

replicate

Requires REPLICATE_API_TOKEN
Uses whisper-diarization model with Whisper Large V3 Turbo
Native diarization support with word-level timestamps

Related Skills

internal-comms

47Kwriting

A set of resources to help me write all kinds of internal communications, using the formats that my company likes to use. Claude should use this skill whenever asked to write some sort of internal communications (status reports, leadership updates, 3P updates, company newsletters, FAQs, incident reports, project updates, etc.).

anthropics

Obtener

write-pr

45Kwriting

Writing pull request titles and descriptions for the tldraw repository. Use when creating a new PR, updating an existing PR's title or body, or when the /pr command needs PR content guidance.

tldraw

Obtener

data-storytelling

26Kwriting

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

wshobson

Obtener

employment-contract-templates

26Kwriting

Create employment contracts, offer letters, and HR policy documents following legal best practices. Use when drafting employment agreements, creating HR policies, or standardizing employment documentation.

wshobson

Obtener

tailored-resume-generator

23Kwriting

Analyzes job descriptions and generates tailored resumes that highlight relevant experience, skills, and achievements to maximize interview chances

ComposioHQ

Obtener

content-research-writer

23Kwriting

Assists in writing high-quality content by conducting research, adding citations, improving hooks, iterating on outlines, and providing real-time feedback on each section. Transforms your writing process from solo effort to collaborative partnership.

ComposioHQ

Obtener

audio-transcribe

Audio Transcribe

Command

Inputs

Output

Examples

Extracting Audio from Video

Providers

local

fal

replicate

You Might Also Like

Related Skills

internal-comms

write-pr

data-storytelling

employment-contract-templates

tailored-resume-generator

content-research-writer