hyperframes-media

HyperFrames Media

CLI commands that create assets (tts, bgm, transcribe, remove-background), plus everything needed to consume and animate transcript data in HTML. For placing assets into compositions, see hyperframes-core.

Provider chains (auto-detected from env)

TTS — npx hyperframes tts "..." picks the first available provider:

Order	Provider	Detected when	Word timestamps
1	HeyGen (Starfish)	`$HEYGEN_API_KEY` / `hyperframes auth login`	Yes, native — pass `--words narration.words.json` to capture
2	ElevenLabs	`$ELEVENLABS_API_KEY` set	No — chain `transcribe` after
3	Kokoro-82M (local, 54 voices)	always (no key required)	No — chain `transcribe` after

If the installed hyperframes tts is the local-only build (its --help says "Kokoro-82M" and has no --provider/--words flags), it silently falls back to Kokoro even with $HEYGEN_API_KEY set. To force HeyGen regardless of CLI version, use the self-contained scripts/heygen-tts.mjs (see references/tts.md).

BGM — npx hyperframes bgm --duration N:

Order	Provider	Detected when
1	Google Lyria (RealTime)	`$GEMINI_API_KEY` or `$GOOGLE_API_KEY` set
2	MusicGen (`facebook/musicgen-small`, local)	Python `transformers + torch + soundfile` installed

Override either with --provider <name>.

Routing

Task	Read
`npx hyperframes tts` — provider chain, voice IDs, words.json	`references/tts.md`
HeyGen without the CLI — self-contained REST script (wav + words)	`scripts/heygen-tts.mjs` (see `references/tts.md`)
`npx hyperframes bgm` — Lyria vs MusicGen, mood prompts, tuning	`references/bgm.md`
`npx hyperframes transcribe` — Whisper, model rules, output shape	`references/transcribe.md`
`npx hyperframes remove-background` — transparent cutouts	`references/remove-background.md`
TTS → transcription → captions (no recorded voiceover)	`references/tts-to-captions.md`
Caption authoring — style detection, layout, word grouping, exit	`references/captions/authoring.md`
Transcript handling — input formats, quality gates, cleanup, APIs	`references/captions/transcript-handling.md`
Caption motion — karaoke, marker effects, audio-reactive	`references/captions/motion.md`
Model caches, system dependencies, troubleshooting	`references/requirements.md`

Non-negotiable rules

Voice IDs are provider-specific. am_michael is Kokoro-only; HeyGen UUIDs don't work on Kokoro. If you pass --voice, also pin --provider to avoid silent provider drift when the user's env changes.
Always pass --model to transcribe. The CLI default small.en silently translates non-English audio. See references/transcribe.md → "Language Rule".
HeyGen returns word timestamps; ElevenLabs / Kokoro do not. When you want captions, either pass --words to HeyGen and use that JSON directly, or run transcribe against the audio file. Don't assume word data is always there.
Captions consume the flat word-array format with { id, text, start, end }. See references/transcribe.md → "Output Shape".
remove-background --background-output is hole-cut, not inpainted. For "scene without the person", a different tool is needed. See references/remove-background.md → "When NOT the right tool".

Related Skills

caveman-compress

73Kbackend-api

juliusbrussee

获取

lark-base

14Kbackend-api

飞书多维表格（Base）操作：建表、字段、记录、视图、统计、公式/lookup、表单、仪表盘、workflow、角色权限；遇到 Base/多维表格/bitable 或 /base/ 链接时使用。文件导入转 lark-drive，认证/授权转 lark-shared。

larksuite

获取

azure-resource-visualizer

1.2Kbackend-api

Analyze Azure resource groups and generate detailed Mermaid architecture diagrams showing the relationships between individual resources. WHEN: create architecture diagram, visualize Azure resources, show resource relationships, generate Mermaid diagram, analyze resource group, diagram my resources, architecture visualization, resource topology, map Azure infrastructure.

microsoft

获取

azure-aigateway

1.2Kbackend-api

Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway.

microsoft

获取

firebase-ai-logic-basics

357backend-api

Official skill for integrating Firebase AI Logic (Gemini API) into web applications. Covers setup, multimodal inference, structured output, and security.

firebase

获取

wonda-cli

127backend-api

Using the Wonda CLI to generate images, videos, music, and audio from the terminal — plus LinkedIn, Reddit, and X/Twitter research and automation

degausai

获取

hyperframes-media