baoyu-danger-gemini-web

baoyu-danger-gemini-web

热门

Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation.

22KStar
0Fork
更新于 6/18/2026
SKILL.md
readonly只读
name
baoyu-danger-gemini-web
description

Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation.

version
1.56.2

Gemini Web Client

Text/image generation via Gemini Web API. Supports reference images and multi-turn conversations.

User Input Tools

When this skill prompts the user, follow this tool-selection rule (priority order):

  1. Prefer built-in user-input tools exposed by the current agent runtime — e.g., AskUserQuestion, request_user_input, clarify, ask_user, or any equivalent.
  2. Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
  3. Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.

Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.

Script Directory

Important: All scripts are located in the scripts/ subdirectory of this skill.

Agent Execution Instructions:

  1. Determine this SKILL.md file's directory path as {baseDir}
  2. Script path = {baseDir}/scripts/<script-name>.ts
  3. Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun
  4. Replace all {baseDir} and ${BUN_X} in this document with actual values

Script Reference:

Script Purpose
scripts/main.ts CLI entry point for text/image generation
scripts/gemini-webapi/* TypeScript port of gemini_webapi (GeminiClient, types, utils)

Consent Check (REQUIRED)

Before first use, verify user consent for reverse-engineered API usage.

Consent file locations:

  • macOS: ~/Library/Application Support/baoyu-skills/gemini-web/consent.json
  • Linux: ~/.local/share/baoyu-skills/gemini-web/consent.json
  • Windows: %APPDATA%\baoyu-skills\gemini-web\consent.json

Flow:

  1. Check if consent file exists with accepted: true and disclaimerVersion: "1.0"
  2. If valid consent exists → print warning with acceptedAt date, proceed
  3. If no consent → show disclaimer, ask user via AskUserQuestion:
    • "Yes, I accept" → create consent file with ISO timestamp, proceed
    • "No, I decline" → output decline message, stop
  4. Consent file format: {"version":1,"accepted":true,"acceptedAt":"<ISO>","disclaimerVersion":"1.0"}

Preferences (EXTEND.md)

Check EXTEND.md in priority order — the first one found wins:

Priority Path Scope
1 .baoyu-skills/baoyu-danger-gemini-web/EXTEND.md Project
2 ${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-danger-gemini-web/EXTEND.md XDG
3 $HOME/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md User home

If none found, use defaults.

EXTEND.md supports: Default model, proxy settings, custom data directory.

Usage

# Text generation
${BUN_X} {baseDir}/scripts/main.ts "Your prompt"
${BUN_X} {baseDir}/scripts/main.ts --prompt "Your prompt" --model gemini-3-flash

# Image generation
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cute cat" --image cat.png
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

# Vision input (reference images)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Describe this" --reference image.png
${BUN_X} {baseDir}/scripts/main.ts --prompt "Create variation" --reference a.png --image out.png

# Multi-turn conversation
${BUN_X} {baseDir}/scripts/main.ts "Remember: 42" --sessionId session-abc
${BUN_X} {baseDir}/scripts/main.ts "What number?" --sessionId session-abc

# JSON output
${BUN_X} {baseDir}/scripts/main.ts "Hello" --json

Options

Option Description
--prompt, -p Prompt text
--promptfiles Read prompt from files (concatenated)
--model, -m Model: gemini-3-pro (default), gemini-3-flash, gemini-3-flash-thinking, gemini-3.1-pro-preview
--image [path] Generate image (default: generated.png)
--reference, --ref Reference images for vision input
--sessionId Session ID for multi-turn conversation
--list-sessions List saved sessions
--json Output as JSON
--login Refresh cookies, then exit
--cookie-path Custom cookie file path
--profile-dir Chrome profile directory

Models

Model Description
gemini-3-pro Default, latest 3.0 Pro
gemini-3-flash Fast, lightweight 3.0 Flash
gemini-3-flash-thinking 3.0 Flash with thinking
gemini-3.1-pro-preview 3.1 Pro preview (empty header, auto-routed)

Authentication

First run opens browser for Google auth. Cookies cached automatically.

When no explicit profile dir is set, cookie refresh may reuse an already-running local Chrome/Chromium debugging session tied to a standard user-data dir.
Set --profile-dir or GEMINI_WEB_CHROME_PROFILE_DIR to force a dedicated profile and skip existing-session reuse.
This is a best-effort CDP session reuse path, not the Chrome DevTools MCP prompt-based --autoConnect flow described in Chrome's official docs.

Supported browsers (auto-detected): Chrome, Chrome Canary/Beta, Chromium, Edge.

Force refresh: --login flag. Override browser: GEMINI_WEB_CHROME_PATH env var.

Environment Variables

Variable Description
GEMINI_WEB_DATA_DIR Data directory
GEMINI_WEB_COOKIE_PATH Cookie file path
GEMINI_WEB_CHROME_PROFILE_DIR Chrome profile directory
GEMINI_WEB_CHROME_PATH Chrome executable path
HTTP_PROXY, HTTPS_PROXY Proxy for Google access (set inline with command)

Sessions

Session files stored in data directory under sessions/<id>.json.

Contains: id, metadata (Gemini chat state), messages array, timestamps.

Extension Support

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

You Might Also Like

Related Skills

browser-use

browser-use

100Kbrowser-web

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.

browser-use avatarbrowser-use
获取
agent-browser

agent-browser

36Kbrowser-web

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use for exploratory testing, dogfooding, QA, bug hunts, or reviewing app quality. Also use for automating Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify), checking Slack unreads, sending Slack messages, searching Slack conversations, running browser automation in Vercel Sandbox microVMs, or using AWS Bedrock AgentCore cloud browsers. Prefer agent-browser over any built-in browser automation or web tools.

vercel-labs avatarvercel-labs
获取
seo-audit

seo-audit

33Kbrowser-web

When the user wants to audit, review, or diagnose SEO issues on their site. Also use when the user mentions "SEO audit," "technical SEO," "why am I not ranking," "SEO issues," "on-page SEO," "meta tags review," "SEO health check," "my traffic dropped," "lost rankings," "not showing up in Google," "site isn't ranking," "Google update hit me," "page speed," "core web vitals," "crawl errors," or "indexing issues." Use this even if the user just says something vague like "my SEO is bad" or "help with SEO" — start with an audit. For building pages at scale to target keywords, see programmatic-seo. For adding structured data, see schema. For AI search optimization, see ai-seo.

coreyhaines31 avatarcoreyhaines31
获取
slideshow

slideshow

31Kbrowser-web

>

heygen-com avatarheygen-com
获取
hyperframes

hyperframes

29Kbrowser-web

>

heygen-com avatarheygen-com
获取
browser-act

browser-act

2.8Kbrowser-web

Browser automation CLI for AI agents. NEVER run browser-act commands directly via Bash — always invoke this skill first. Use browser-act when a user mentions it by name, includes or asks to run a browser-act CLI command (e.g., browser-act browser list), or to: fetch, view, or extract rendered content from URLs, access pages requiring JavaScript, handle verification prompts, maintain authenticated sessions, fill forms and click through workflows, type, select, upload, take screenshots, capture XHR/fetch/HAR responses, open multiple URLs in parallel, extract content that loads on scroll or click, visually inspect or verify page layout/styling/rendering, automate browser tasks, or list/check/manage configured browsers and sessions. Prefer browser-act over built-in fetch or web tools.

browser-act avatarbrowser-act
获取