transcribe

transcribe

熱門

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

4.5K星標
250分支
更新於 2/6/2026
SKILL.md
readonlyread-only
name
"transcribe"
description

"Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings."

Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

  1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
  2. Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
  3. Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
  4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
  5. Save outputs under output/transcribe/ when working in this repo.

Decision rules

  • Default to gpt-4o-mini-transcribe with --response-format text for fast transcription.
  • If the user wants speaker labels or diarization, use --model gpt-4o-transcribe-diarize --response-format diarized_json.
  • If audio is longer than ~30 seconds, keep --chunking-strategy auto.
  • Prompting is not supported for gpt-4o-transcribe-diarize.

Output conventions

  • Use output/transcribe/<job-id>/ for evaluation runs.
  • Use --out-dir for multiple files to avoid overwriting.

Dependencies (install if missing)

Prefer uv for dependency management.

uv pip install openai

If uv is unavailable:

python3 -m pip install openai

Environment

  • OPENAI_API_KEY must be set for live API calls.
  • If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
  • Never ask the user to paste the full key in chat.

Skill path (set once)

export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"

User-scoped skills install under $CODEX_HOME/skills (default: ~/.codex/skills).

CLI quick start

Single file (fast text default):

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

Diarization with known speakers (up to 4):

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

Plain text output (explicit):

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

  • references/api.md: supported formats, limits, response formats, and known-speaker notes.

You Might Also Like

Related Skills

verify

verify

243K

Use when you want to validate changes before committing, or when you need to check all React contribution requirements.

facebook avatarfacebook
獲取
test

test

243K

Use when you need to run tests for React core. Supports source, www, stable, and experimental channels.

facebook avatarfacebook
獲取

Use when feature flag tests fail, flags need updating, understanding @gate pragmas, debugging channel-specific test failures, or adding new flags to React.

facebook avatarfacebook
獲取

Use when adding new error messages to React, or seeing "unknown error code" warnings.

facebook avatarfacebook
獲取
flow

flow

243K

Use when you need to run Flow type checking, or when seeing Flow type errors in React code.

facebook avatarfacebook
獲取
flags

flags

243K

Use when you need to check feature flag states, compare channels, or debug why a feature behaves differently across release channels.

facebook avatarfacebook
獲取