
agent-media
Agent-first media toolkit for image, video, and audio processing. Use when you need to resize, convert, generate images, remove backgrounds, extract audio, transcribe speech, or generate videos. All commands return deterministic JSON output.
Agent-first media toolkit for image, video, and audio processing. Use when you need to resize, convert, generate images, remove backgrounds, extract audio, transcribe speech, or generate videos. All commands return deterministic JSON output.
Agent Media
Agent Media is an agent-first media toolkit that provides CLI-accessible commands for image, video, and audio processing. All commands produce deterministic, machine-readable JSON output.
Available Commands
Image Commands
agent-media image resize- Resize an imageagent-media image convert- Convert image formatagent-media image remove-background- Remove image backgroundagent-media image generate- Generate image from text
Audio Commands
agent-media audio extract- Extract audio from videoagent-media audio transcribe- Transcribe audio to text
Video Commands
agent-media video generate- Generate video from text or image
Output Format
All commands return JSON to stdout:
{
"ok": true,
"media_type": "image",
"action": "resize",
"provider": "local",
"output_path": "output_123.webp",
"mime": "image/webp",
"bytes": 12345
}
On error:
{
"ok": false,
"error": {
"code": "INVALID_INPUT",
"message": "input file not found"
}
}
Providers
- local - Default provider using Sharp (resize, convert) and Transformers.js (remove-background, transcribe)
- fal - fal.ai provider (generate, edit, remove-background, transcribe, video)
- replicate - Replicate API (generate, edit, remove-background, transcribe, video)
- runpod - Runpod API (generate, edit)
- ai-gateway - Vercel AI Gateway (generate, edit)
Provider Selection
- Explicit:
--provider <name> - Auto-detect from environment variables
- Fallback to local provider
Environment Variables
AGENT_MEDIA_DIR- Custom output directoryFAL_API_KEY- Enable fal providerREPLICATE_API_TOKEN- Enable replicate providerRUNPOD_API_KEY- Enable runpod providerAI_GATEWAY_API_KEY- Enable ai-gateway provider
You Might Also Like
Related Skills

songsee
Generate spectrograms and feature-panel visualizations from audio with the songsee CLI.
moltbot
slack-gif-creator
Knowledge and utilities for creating animated GIFs optimized for Slack. Provides constraints, validation tools, and animation concepts. Use when users request animated GIFs for Slack like "make me a GIF of X doing Y for Slack."
anthropics
algorithmic-art
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
anthropics
brand-guidelines
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
anthropics
theme-factory
Toolkit for styling artifacts with a theme. These artifacts can be slides, docs, reportings, HTML landing pages, etc. There are 10 pre-set themes with colors/fonts that you can apply to any artifact that has been creating, or can generate a new theme on-the-fly.
anthropics
canvas-design
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.
anthropics