nano-banana-2

nano-banana-2

>

2Star
0Fork
更新于 6/18/2026
SKILL.md
readonly只读
name
nano-banana-2
description

>

Nano Banana 2 — Pro Pack on RunComfy

runcomfy.com · Model page · GitHub

Google Nano Banana 2 — the flash-tier text-to-image model in the Gemini family — hosted on the RunComfy Model API. Optimized for ideation, social-thumbnail batches, and rapid drafts with strong in-image typography.

npx skills add agentspace-so/runcomfy-skills --skill nano-banana-2 -g

When to pick this model (vs siblings)

Nano Banana 2 is the flash-tier of the Google image-gen line. Pick it when iteration speed and predictable framing matter more than maximum detail.

You want Use
Rapid drafts, social thumbnails, batch variants Nano Banana 2
In-image typography with predictable rendering Nano Banana 2
Web-grounded image (current events / real entities) Nano Banana 2 + enable_web_search
Image edit (preserve subject, swap background) Nano Banana Edit (sibling skill)
Heavy stylization, painterly look Flux 2
Maximum prompt adherence + multilingual text GPT Image 2
2K–4K hero shots, max realism Seedream 5
Hyperrealistic portrait Nano Banana Pro

If the user said "Nano Banana" / "nano-banana-2" / "Gemini image" explicitly, route here regardless. If they said "Nano Banana" without specifying 2 vs Pro, default to Pro for portraits and 2 for everything else.

Prerequisites

  1. RunComfy CLInpm i -g @runcomfy/cli
  2. RunComfy accountruncomfy login opens a browser device-code flow.
  3. CI / containers — set RUNCOMFY_TOKEN=<token> instead of runcomfy login.

Endpoints + input schema

google/nano-banana-2/text-to-image

Field Type Required Default Notes
prompt string yes Subject-first description.
num_images int no 1 1–4. Use 4 for ideation rounds.
seed int no 0 Reuse for reproducibility.
aspect_ratio enum no auto auto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16.
resolution enum no 1K 0.5K (drafts), 1K (default), 2K (final), 4K (max).
output_format enum no png png, jpeg, webp.
safety_tolerance int no 4 1 (strict) – 6 (permissive).
limit_generations bool no true Limit each prompt round to one generation.
enable_web_search bool no false Adds web grounding (extra cost + latency).

For image edit (preserve subject + apply changes), see the sibling nano-banana-edit skill.

How to invoke

Default draft (1K, square, png):

runcomfy run google/nano-banana-2/text-to-image \
  --input '{"prompt": "<user prompt>"}' \
  --output-dir <absolute/path>

Vertical 4-up batch for ideation:

runcomfy run google/nano-banana-2/text-to-image \
  --input '{
    "prompt": "<user prompt>",
    "num_images": 4,
    "aspect_ratio": "9:16",
    "resolution": "0.5K"
  }' \
  --output-dir <absolute/path>

Final at 2K with seed lock:

runcomfy run google/nano-banana-2/text-to-image \
  --input '{
    "prompt": "<user prompt>",
    "resolution": "2K",
    "aspect_ratio": "16:9",
    "seed": 42
  }' \
  --output-dir <absolute/path>

Web-grounded (current event / real entity):

runcomfy run google/nano-banana-2/text-to-image \
  --input '{
    "prompt": "<prompt referencing a real-world event from this week>",
    "enable_web_search": true
  }' \
  --output-dir <absolute/path>

Prompting — what actually works

Subject-first declarative grammar. "A cinematic close-up portrait of an American woman standing under neon lights in rainy Tokyo, shallow depth of field, reflective wet streets, ultra-detailed, realistic skin texture" — primary subject, then action, environment, style, camera. Front-load subject; trail with directives.

Exact text quoting for in-image typography. "The label reads 'AURA' in clean bold sans-serif, centered, white on black" — quote the literal characters. Specify placement and font style. Don't say "with the brand name on it" and hope.

Consistent seeds for refinement. Lock seed when iterating a single prompt across small variants — keeps composition stable.

Web-grounding, sparingly. Turn on enable_web_search only when the prompt names current events / real entities. Adds latency + cost; off by default.

Don't conflict styles. "minimalist + ornate + retro + cyberpunk" cancels. Pick 1–2 anchors.

Anti-patterns:

  • Trying to verbally describe a stable subject identity — use the edit endpoint with image refs instead.
  • Asking for resolutions outside the 4 tiers → 422.
  • Aspect ratios outside the 11 supported values → 422.
  • Non-quoted in-image text → unpredictable rendering.

Where it shines

Use case Why Nano Banana 2
Marketing draft thumbnails (batch of 4) Fast iteration at 0.5K, then promote winner to 2K
Social-platform-native Wide aspect ratio support including 9:16, 4:5, 21:9
In-image typography for posters / cards Predictable text rendering when characters are quoted
Web-grounded current-event imagery enable_web_search integrates fresh info
Reproducible variant testing Strong seed + consistent framing

Sample prompts (verified to produce strong results)

Cinematic portrait (page example):

A cinematic close-up portrait of an American woman standing under neon
lights in rainy Tokyo, shallow depth of field, reflective wet streets,
ultra-detailed, realistic skin texture

Brand-asset card with quoted text:

A minimalist 16:9 product card: a matte black ceramic mug centered on a
soft warm-grey paper background, rim highlight from upper-left, the
headline "Brewed Quietly" in clean bold sans-serif top-right, balanced
negative space below, e-commerce ready, clean studio lighting

Vertical platform-native:

A 9:16 vertical hero for a wellness brand: a single ceramic teacup on a
linen runner, soft morning side-light, the words "Slow Down" in
hand-drawn serif large at the top, gentle steam rising, neutral color
palette, uncluttered

Limitations

  • Still images only. No video on this endpoint.
  • Max 4 outputs per request.
  • Web search adds latency + cost — only enable on demand.
  • 2K / 4K cost more — default to 1K unless user asked for higher.
  • For image edit, use the /edit endpoint — not this one.

Exit codes

code meaning
0 success
64 bad CLI args
65 bad input JSON / schema mismatch
69 upstream 5xx
75 retryable: timeout / 429
77 not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill invokes runcomfy run google/nano-banana-2/text-to-image with a JSON body matching the schema. The CLI POSTs to https://model-api.runcomfy.net/v1/models/google/nano-banana-2/text-to-image, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.

Security & Privacy

  • Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.
  • Input boundary: the user prompt is passed as a JSON string to the CLI via --input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
  • Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
  • Outbound endpoints: only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.
  • Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.

You Might Also Like

Related Skills

writing-skills

writing-skills

233Kresearch-knowledge

Use when creating new skills, editing existing skills, or verifying skills work before deployment

obra avatarobra
获取
doc-coauthoring

doc-coauthoring

153Kresearch-knowledge

Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.

anthropics avataranthropics
获取
claude-api

claude-api

153Kresearch-knowledge

|-

anthropics avataranthropics
获取
mcp-builder

mcp-builder

153Kresearch-knowledge

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

anthropics avataranthropics
获取
xlsx

xlsx

152Kresearch-knowledge

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

anthropics avataranthropics
获取
docx

docx

151Kresearch-knowledge

Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.

anthropics avataranthropics
获取