hyperframes-media

hyperframes-media

Popular

Asset preprocessing for HyperFrames compositions — multi-provider TTS (HeyGen / ElevenLabs / Kokoro local), multi-provider BGM (Google Lyria / local MusicGen), Whisper transcription, background removal, and caption authoring. Use for npx hyperframes tts, bgm, transcribe, remove-background, voice/provider selection, music-mood prompting, captions / subtitles / lyrics / karaoke / per-word styling.

29Kstars
0forks
Updated 6/20/2026
SKILL.md
readonlyread-only
name
hyperframes-media
description

Asset preprocessing for HyperFrames compositions — multi-provider TTS (HeyGen / ElevenLabs / Kokoro local), multi-provider BGM (Google Lyria / local MusicGen), Whisper transcription, background removal, and caption authoring. Use for npx hyperframes tts, bgm, transcribe, remove-background, voice/provider selection, music-mood prompting, captions / subtitles / lyrics / karaoke / per-word styling.

HyperFrames Media

CLI commands that create assets (tts, bgm, transcribe, remove-background), plus everything needed to consume and animate transcript data in HTML. For placing assets into compositions, see hyperframes-core.

Provider chains (auto-detected from env)

TTSnpx hyperframes tts "..." picks the first available provider:

Order Provider Detected when Word timestamps
1 HeyGen (Starfish) $HEYGEN_API_KEY / hyperframes auth login Yes, native — pass --words narration.words.json to capture
2 ElevenLabs $ELEVENLABS_API_KEY set No — chain transcribe after
3 Kokoro-82M (local, 54 voices) always (no key required) No — chain transcribe after

If the installed hyperframes tts is the local-only build (its --help says "Kokoro-82M" and has no --provider/--words flags), it silently falls back to Kokoro even with $HEYGEN_API_KEY set. To force HeyGen regardless of CLI version, use the self-contained scripts/heygen-tts.mjs (see references/tts.md).

BGMnpx hyperframes bgm --duration N:

Order Provider Detected when
1 Google Lyria (RealTime) $GEMINI_API_KEY or $GOOGLE_API_KEY set
2 MusicGen (facebook/musicgen-small, local) Python transformers + torch + soundfile installed

Override either with --provider <name>.

Routing

Task Read
npx hyperframes tts — provider chain, voice IDs, words.json references/tts.md
HeyGen without the CLI — self-contained REST script (wav + words) scripts/heygen-tts.mjs (see references/tts.md)
npx hyperframes bgm — Lyria vs MusicGen, mood prompts, tuning references/bgm.md
npx hyperframes transcribe — Whisper, model rules, output shape references/transcribe.md
npx hyperframes remove-background — transparent cutouts references/remove-background.md
TTS → transcription → captions (no recorded voiceover) references/tts-to-captions.md
Caption authoring — style detection, layout, word grouping, exit references/captions/authoring.md
Transcript handling — input formats, quality gates, cleanup, APIs references/captions/transcript-handling.md
Caption motion — karaoke, marker effects, audio-reactive references/captions/motion.md
Model caches, system dependencies, troubleshooting references/requirements.md

Non-negotiable rules

  • Voice IDs are provider-specific. am_michael is Kokoro-only; HeyGen UUIDs don't work on Kokoro. If you pass --voice, also pin --provider to avoid silent provider drift when the user's env changes.
  • Always pass --model to transcribe. The CLI default small.en silently translates non-English audio. See references/transcribe.md → "Language Rule".
  • HeyGen returns word timestamps; ElevenLabs / Kokoro do not. When you want captions, either pass --words to HeyGen and use that JSON directly, or run transcribe against the audio file. Don't assume word data is always there.
  • Captions consume the flat word-array format with { id, text, start, end }. See references/transcribe.md → "Output Shape".
  • remove-background --background-output is hole-cut, not inpainted. For "scene without the person", a different tool is needed. See references/remove-background.md → "When NOT the right tool".

You Might Also Like

Related Skills

caveman-compress

caveman-compress

73Kbackend-api

>

juliusbrussee avatarjuliusbrussee
Get
lark-base

lark-base

14Kbackend-api

飞书多维表格(Base)操作:建表、字段、记录、视图、统计、公式/lookup、表单、仪表盘、workflow、角色权限;遇到 Base/多维表格/bitable 或 /base/ 链接时使用。文件导入转 lark-drive,认证/授权转 lark-shared。

larksuite avatarlarksuite
Get

Analyze Azure resource groups and generate detailed Mermaid architecture diagrams showing the relationships between individual resources. WHEN: create architecture diagram, visualize Azure resources, show resource relationships, generate Mermaid diagram, analyze resource group, diagram my resources, architecture visualization, resource topology, map Azure infrastructure.

microsoft avatarmicrosoft
Get
azure-aigateway

azure-aigateway

1.2Kbackend-api

Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway.

microsoft avatarmicrosoft
Get

Official skill for integrating Firebase AI Logic (Gemini API) into web applications. Covers setup, multimodal inference, structured output, and security.

firebase avatarfirebase
Get
wonda-cli

wonda-cli

127backend-api

Using the Wonda CLI to generate images, videos, music, and audio from the terminal — plus LinkedIn, Reddit, and X/Twitter research and automation

degausai avatardegausai
Get