
hyperframes-media
热门Asset preprocessing for HyperFrames compositions — multi-provider TTS (HeyGen / ElevenLabs / Kokoro local), multi-provider BGM (Google Lyria / local MusicGen), Whisper transcription, background removal, and caption authoring. Use for npx hyperframes tts, bgm, transcribe, remove-background, voice/provider selection, music-mood prompting, captions / subtitles / lyrics / karaoke / per-word styling.
Asset preprocessing for HyperFrames compositions — multi-provider TTS (HeyGen / ElevenLabs / Kokoro local), multi-provider BGM (Google Lyria / local MusicGen), Whisper transcription, background removal, and caption authoring. Use for npx hyperframes tts, bgm, transcribe, remove-background, voice/provider selection, music-mood prompting, captions / subtitles / lyrics / karaoke / per-word styling.
HyperFrames Media
CLI commands that create assets (tts, bgm, transcribe, remove-background), plus everything needed to consume and animate transcript data in HTML. For placing assets into compositions, see hyperframes-core.
Provider chains (auto-detected from env)
TTS — npx hyperframes tts "..." picks the first available provider:
| Order | Provider | Detected when | Word timestamps |
|---|---|---|---|
| 1 | HeyGen (Starfish) | $HEYGEN_API_KEY / hyperframes auth login |
Yes, native — pass --words narration.words.json to capture |
| 2 | ElevenLabs | $ELEVENLABS_API_KEY set |
No — chain transcribe after |
| 3 | Kokoro-82M (local, 54 voices) | always (no key required) | No — chain transcribe after |
If the installed
hyperframes ttsis the local-only build (its--helpsays "Kokoro-82M" and has no--provider/--wordsflags), it silently falls back to Kokoro even with$HEYGEN_API_KEYset. To force HeyGen regardless of CLI version, use the self-containedscripts/heygen-tts.mjs(seereferences/tts.md).
BGM — npx hyperframes bgm --duration N:
| Order | Provider | Detected when |
|---|---|---|
| 1 | Google Lyria (RealTime) | $GEMINI_API_KEY or $GOOGLE_API_KEY set |
| 2 | MusicGen (facebook/musicgen-small, local) |
Python transformers + torch + soundfile installed |
Override either with --provider <name>.
Routing
| Task | Read |
|---|---|
npx hyperframes tts — provider chain, voice IDs, words.json |
references/tts.md |
| HeyGen without the CLI — self-contained REST script (wav + words) | scripts/heygen-tts.mjs (see references/tts.md) |
npx hyperframes bgm — Lyria vs MusicGen, mood prompts, tuning |
references/bgm.md |
npx hyperframes transcribe — Whisper, model rules, output shape |
references/transcribe.md |
npx hyperframes remove-background — transparent cutouts |
references/remove-background.md |
| TTS → transcription → captions (no recorded voiceover) | references/tts-to-captions.md |
| Caption authoring — style detection, layout, word grouping, exit | references/captions/authoring.md |
| Transcript handling — input formats, quality gates, cleanup, APIs | references/captions/transcript-handling.md |
| Caption motion — karaoke, marker effects, audio-reactive | references/captions/motion.md |
| Model caches, system dependencies, troubleshooting | references/requirements.md |
Non-negotiable rules
- Voice IDs are provider-specific.
am_michaelis Kokoro-only; HeyGen UUIDs don't work on Kokoro. If you pass--voice, also pin--providerto avoid silent provider drift when the user's env changes. - Always pass
--modeltotranscribe. The CLI defaultsmall.ensilently translates non-English audio. Seereferences/transcribe.md→ "Language Rule". - HeyGen returns word timestamps; ElevenLabs / Kokoro do not. When you want captions, either pass
--wordsto HeyGen and use that JSON directly, or runtranscribeagainst the audio file. Don't assume word data is always there. - Captions consume the flat word-array format with
{ id, text, start, end }. Seereferences/transcribe.md→ "Output Shape". remove-background --background-outputis hole-cut, not inpainted. For "scene without the person", a different tool is needed. Seereferences/remove-background.md→ "When NOT the right tool".
You Might Also Like
Related Skills

lark-base
飞书多维表格(Base)操作:建表、字段、记录、视图、统计、公式/lookup、表单、仪表盘、workflow、角色权限;遇到 Base/多维表格/bitable 或 /base/ 链接时使用。文件导入转 lark-drive,认证/授权转 lark-shared。
larksuite
azure-resource-visualizer
Analyze Azure resource groups and generate detailed Mermaid architecture diagrams showing the relationships between individual resources. WHEN: create architecture diagram, visualize Azure resources, show resource relationships, generate Mermaid diagram, analyze resource group, diagram my resources, architecture visualization, resource topology, map Azure infrastructure.
microsoft
azure-aigateway
Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway.
microsoft
firebase-ai-logic-basics
Official skill for integrating Firebase AI Logic (Gemini API) into web applications. Covers setup, multimodal inference, structured output, and security.
firebase


