>
Image Edit — Pro Pack on RunComfy
runcomfy.com · Nano Banana Edit · GPT Image 2 Edit · Flux Kontext · Z-Image Inpaint · GitHub
Image edit, intent-routed. This skill doesn't lock you to one model — it picks the right edit model in the RunComfy catalog based on what the user actually wants: batch identity-preservation, multilingual text rewrite, single-shot precise edit, or mask-driven region replacement.
npx skills add agentspace-so/runcomfy-skills --skill image-edit -g
Pick the right model for the user's intent
| User intent | Model | Why |
|---|---|---|
| Batch edit 1–20 images consistently (SKU gallery, A/B variants) | Nano Banana Edit | Up to 20 input images per call; locked aspect/resolution for series |
| Swap background, preserve subject identity | Nano Banana Edit | Strong identity preservation under "keep X unchanged" prompts |
| Localized object removal / addition with spatial language ("the left object", "upper-right corner") | Nano Banana Edit | Honors directional spatial scope |
| Multilingual / non-Latin in-image text rewrite (Japanese kana, Cyrillic, Arabic) | GPT Image 2 Edit | Strongest in class for multilingual typography |
| Multi-reference composition (subject from img1, scene from img2, palette from img3) | GPT Image 2 Edit | Numbered refs route cues correctly |
| Layout-precise repositioning ("move headline from top-right to bottom-center") | GPT Image 2 Edit | Directional language honored at layout level |
| Identity preservation across translated headline variants | GPT Image 2 Edit | Same source asset → many language variants, identity stable |
| Single-shot precise local edit ("she's now holding an orange umbrella") | Flux Kontext Pro | Single-ref single-instruction, high-fidelity preservation |
| Mask-driven object removal (cables, watermarks, distractions) | Z-Image Turbo Inpaint | Mask-required, strength-tunable, edge-consistent |
| Mask-driven region replacement (full background swap with mask) | Z-Image Turbo Inpaint | High strength + clean mask = clean replacement |
| Default if unspecified | Nano Banana Edit | Most flexible, supports both single and batch |
The agent reads this table, classifies the user's intent, and picks the matching subsection below.
Prerequisites
- RunComfy CLI —
npm i -g @runcomfy/cli - RunComfy account —
runcomfy login. - CI / containers — set
RUNCOMFY_TOKEN=<token>.
Route 1: Nano Banana Edit — default for general edit + batch
Model: google/nano-banana-2/edit
Schema
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt |
string | yes | — | Lead with preservation goals, end with the change. |
image_urls |
array | yes | — | 1–20 publicly-fetchable HTTPS URLs. |
number_of_images |
int | no | 1 | 1–4 outputs per call. |
aspect_ratio |
enum | no | auto |
auto follows input; lock for batch consistency. |
resolution |
enum | no | 1K |
0.5K / 1K / 2K / 4K. |
output_format |
enum | no | png |
png / jpeg / webp. |
seed |
int | no | — | Reproducibility. |
enable_web_search |
bool | no | false | Web-grounded edits (extra latency). |
Invoke
runcomfy run google/nano-banana-2/edit \
--input '{
"prompt": "Keep the subject identity, pose, and clothing unchanged. Convert the background into a rainy neon cyberpunk street.",
"image_urls": ["https://.../portrait.jpg"]
}' \
--output-dir <absolute/path>
Batch (lock aspect + resolution):
runcomfy run google/nano-banana-2/edit \
--input '{
"prompt": "Replace the watermark in the bottom-right with the text \"AURA\" in clean white sans-serif. Keep everything else exactly as in the input.",
"image_urls": ["https://.../sku-1.jpg", "https://.../sku-2.jpg", "https://.../sku-3.jpg"],
"aspect_ratio": "1:1",
"resolution": "1K"
}' \
--output-dir <absolute/path>
Prompting tips
- Preservation first:
"Keep [identity / pose / brand / framing] unchanged."Then state the change. - Spatial scope: "background only", "the left object", "upper-right quadrant" — concrete locations honored.
- Batch consistency: lock
aspect_ratioandresolutionacross the batch. - Iterate small: split compound edits into multiple shorter passes.
Route 2: GPT Image 2 Edit — multilingual text + multi-ref composition
Model: openai/gpt-image-2/edit
Schema
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt |
string | yes | — | Edit instruction; lead with preservation. |
images |
string[] | yes | — | Up to 10 HTTPS URLs. First is primary; rest are auxiliary. |
size |
enum | no | auto |
auto, 1024_1024, 1024_1536, 1536_1024. Only these. |
Invoke
Multilingual text rewrite:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight.",
"images": ["https://.../poster-en.jpg"]
}' \
--output-dir <absolute/path>
Multi-ref composition:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity unchanged.",
"images": ["https://.../subject.jpg", "https://.../room.jpg"]
}' \
--output-dir <absolute/path>
Prompting tips
- Quote in-image text exactly. Name the script for non-Latin:
"Japanese kana","Cyrillic","Arabic right-to-left". - Number multi-refs:
"subject from image 1, lighting from image 2". - Directional layout language:
"move the headline from top-right to bottom-center","replace the watermark in the bottom-right". size: "auto"preserves input ratio — recommended unless the edit changes framing.
Route 3: Flux Kontext Pro — single-shot precise local edit
Model: blackforestlabs/flux-1-kontext/pro/edit
Schema (minimal)
| Field | Type | Required | Notes |
|---|---|---|---|
prompt |
string | yes | One declarative edit instruction. |
image |
string | yes | Single source image URL. |
aspect_ratio |
enum | no | Pick from supported W:H values. |
seed |
int | no | Reproducibility. |
Single image only — no array. For multi-image flows, use Route 1 (Nano Banana Edit).
Invoke
runcomfy run blackforestlabs/flux-1-kontext/pro/edit \
--input '{
"prompt": "Keep the person'\''s face, pose, and clothing unchanged. Add an orange umbrella in her left hand and a slight smile.",
"image": "https://.../portrait.jpg"
}' \
--output-dir <absolute/path>
Prompting tips
- One declarative instruction. "She is now holding an orange umbrella and smiling" — imperative, single change.
- Preservation first. Lead with
"Keep [unchanged elements]"then state the change. - Iterate small. Compound edits drift on a single pass; split into sequential passes.
Route 4: Z-Image Turbo Inpaint — mask-driven precise region edit
Model: tongyi-mai/z-image/turbo/inpainting
Schema
| Field | Type | Required | Notes |
|---|---|---|---|
prompt |
string | yes | What to fill / replace; preservation constraints for the unmasked surround. |
image |
string | yes | Source image URL. |
mask_image |
string | yes | Grayscale mask URL (white = inpaint, black = preserve). |
strength |
float | no | 0.3–0.6 retouching, 0.7–1.0 full replacement. |
control_scale |
float | no | 0.6–0.9 typical. |
aspect_ratio |
enum | no | W:H output ratio. |
seed |
int | no | Reproducibility. |
Invoke
Object removal (low strength):
runcomfy run tongyi-mai/z-image/turbo/inpainting \
--input '{
"prompt": "Remove overhead cables; preserve rooflines and sky gradient; thin clean sky.",
"image": "https://.../street.jpg",
"mask_image": "https://.../cables-mask.png",
"strength": 0.5,
"control_scale": 0.8
}' \
--output-dir <absolute/path>
Region replacement (high strength):
runcomfy run tongyi-mai/z-image/turbo/inpainting \
--input '{
"prompt": "Replace busy backdrop with smooth light gray studio paper; mask background only.",
"image": "https://.../product.jpg",
"mask_image": "https://.../bg-mask.png",
"strength": 0.9
}' \
--output-dir <absolute/path>
Prompting tips
- A mask URL is required — grayscale, white = inpaint region, black = preserve. Slight blur on mask edges (1–3px) blends better than sharp binary.
- Strength by intent:
0.3–0.5for retouching / cleanup,0.6–0.7for object replacement with style match,0.8–1.0for full-region replacement. - Name what stays outside the mask in the prompt:
"preserve rooflines and sky gradient","match brick pattern and mortar tone". - Spatial labels still help even though the mask defines the region:
"the left shelf","upper-right quadrant".
Limitations
- Each route inherits its model's limits. Nano Banana: 1–20 inputs, 1–4 outputs. GPT Image 2 Edit: up to 10 refs, 4 fixed sizes. Flux Kontext: single ref. Z-Image Inpaint: mask required.
- No multi-route blending. This skill picks one model per call.
- Brand-specific overrides — if the user named a specific model, route to the corresponding brand skill (
gpt-image-edit,flux-kontext,nano-banana-edit) for fuller treatment.
Exit codes
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
How it works
The skill picks one of Nano Banana Edit / GPT Image 2 Edit / Flux Kontext Pro / Z-Image Turbo Inpaint based on user intent and invokes runcomfy run <model_id> with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.
Security & Privacy
- Token storage:
runcomfy loginwrites the API token to~/.config/runcomfy/token.jsonwith mode 0600 (owner-only read/write). SetRUNCOMFY_TOKENenv var to bypass the file entirely in CI / containers. - Input boundary: the user prompt is passed as a JSON string to the CLI via
--input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content. - Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
- Outbound endpoints: only
model-api.runcomfy.net(request submission) and*.runcomfy.net/*.runcomfy.com(download whitelist for generated outputs). No telemetry, no callbacks. - Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
You Might Also Like
Related Skills

writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
obra
doc-coauthoring
Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.
anthropics
mcp-builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
anthropics
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
anthropics
docx
Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.
anthropics