a CLI for intearcting with browser
BE SURE TO CLEAN UP SCREEN SHOTS AFTER YOU ARE DONE WITH EVERYTHING
IF THIS NEEDS TO BE INSTALLED
npm install -g agent-browser
agent-browser install # to get chromium downloaded
agent-browser open example.com
agent-browser snapshot # Get accessibility tree with refs
agent-browser click @e2 # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill by ref
agent-browser get text @e1 # Get text by ref
agent-browser screenshot page.png
agent-browser close
Traditional Selectors (also supported)
agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"
Commands
Core Commands
agent-browser open
agent-browser click
agent-browser dblclick
agent-browser focus
agent-browser type
agent-browser fill
agent-browser press
agent-browser keydown
agent-browser keyup
agent-browser hover
agent-browser select
agent-browser check
agent-browser uncheck
agent-browser scroll
agent-browser scrollintoview
agent-browser drag
agent-browser upload
agent-browser screenshot [path] # Take screenshot (--full for full page)
agent-browser pdf
agent-browser snapshot # Accessibility tree with refs (best for AI)
agent-browser eval
agent-browser close # Close browser (aliases: quit, exit)
Get Info
agent-browser get text
agent-browser get html
agent-browser get value
agent-browser get attr
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count
agent-browser get box
Check State
agent-browser is visible
agent-browser is enabled
agent-browser is checked
Find Elements (Semantic Locators)
agent-browser find role
agent-browser find text
agent-browser find label
Actions: click, fill, check, hover, text
Examples:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
Wait
agent-browser wait
agent-browser wait
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --url "**/dash" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for load state
agent-browser wait --fn "window.ready === true" # Wait for JS condition
Load states: load, domcontentloaded, networkidle
Mouse Control
agent-browser mouse move
agent-browser mouse down [button] # Press button (left/right/middle)
agent-browser mouse up [button] # Release button
agent-browser mouse wheel
Browser Settings
agent-browser set viewport # HTTP basic auth Cookies & Storage agent-browser cookies # Get all cookies agent-browser storage local # Get all localStorage agent-browser storage session # Same for sessionStorage Network agent-browser network route Tabs & Windows agent-browser tab # List tabs Frames agent-browser frame Dialogs agent-browser dialog accept [text] # Accept (with optional prompt text) Debug agent-browser trace start [path] # Start recording trace Navigation agent-browser back # Go back Setup agent-browser install # Download Chromium browser Sessions Run multiple isolated browser instances: agent-browser --session agent1 open site-a.com AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn" agent-browser session list agent-browser session Each session has its own: Snapshot Options The snapshot command supports filtering to reduce output size: agent-browser snapshot # Full accessibility tree Option Description Refs provide deterministic element selection from snapshots: agent-browser snapshot agent-browser click @e2 # Click the button Why use refs? CSS Selectors agent-browser click "#id" Text & XPath agent-browser click "text=Submit" Semantic Locators agent-browser find role button click --name "Submit" Agent Mode Use --json for machine-readable output: agent-browser snapshot --json agent-browser get text @e1 --json Optimal AI Workflow agent-browser open example.com agent-browser click @e2 agent-browser snapshot -i --json Headed Mode Show the browser window for debugging: agent-browser open example.com --headed This opens a visible browser window instead of running headless. Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows: agent-browser open api.example.com --headers '{"Authorization": "Bearer agent-browser snapshot -i --json agent-browser open other-site.com This is useful for: To set headers for multiple origins, use --headers with each open command: agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}' For global headers (all domains), use set headers: agent-browser set headers '{"X-Custom-Header": "value"}' Custom Browser Executable Use a custom browser executable instead of the bundled Chromium. This is useful for: CLI Usage agent-browser --executable-path /path/to/chromium open example.com AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com Serverless Example (Vercel/AWS Lambda) import chromium from '@sparticuz/chromium'; export async function handler() {
agent-browser set device
agent-browser set geo
agent-browser set offline [on|off] # Toggle offline mode
agent-browser set headers
agent-browser set credentials
agent-browser set media [dark|light] # Emulate color scheme
agent-browser cookies set
agent-browser cookies clear # Clear cookies
agent-browser storage local
agent-browser storage local set
agent-browser storage local clear # Clear all
agent-browser network route
agent-browser network route
agent-browser network unroute [url] # Remove routes
agent-browser network requests # View tracked requests
agent-browser network requests --filter api # Filter requests
agent-browser tab new [url] # New tab (optionally with URL)
agent-browser tab
agent-browser tab close [n] # Close tab
agent-browser window new # New window
agent-browser frame main # Back to main frame
agent-browser dialog dismiss # Dismiss
agent-browser trace stop [path] # Stop and save trace
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear # Clear errors
agent-browser highlight
agent-browser state save
agent-browser state load
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser install --with-deps # Also install system deps (Linux)Different sessions
agent-browser --session agent2 open site-b.comOr via environment variable
List active sessions
Output:
Active sessions:
-> default
agent1
Show current session
Browser instance
Cookies and storage
Navigation history
Authentication state
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options
-i, --interactive Only show interactive elements (buttons, links, inputs)
-c, --compact Remove empty structural elements
-d, --depth
-s, --selector
Options
Option Description
--session
--headers
--executable-path
--json JSON output (for agents)
--full, -f Full page screenshot
--name, -n Locator name filter
--exact Exact text match
--headed Show browser window (not headless)
--cdp
--debug Debug output
Selectors
Refs (Recommended for AI)1. Get snapshot with refs
Output:
- heading "Example Domain" [ref=e1] [level=1]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
- link "Learn more" [ref=e4]
2. Use refs to interact
agent-browser fill @e3 "test@example.com" # Fill the textbox
agent-browser get text @e1 # Get heading text
agent-browser hover @e4 # Hover the linkDeterministic: Ref points to exact element from snapshot
Fast: No DOM re-query needed
AI-friendly: Snapshot + ref workflow is optimal for LLMs
agent-browser click ".class"
agent-browser click "div > button"
agent-browser click "xpath=//button"
agent-browser find label "Email" fill "test@test.com"Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
agent-browser is visible @e2 --json1. Navigate and get snapshot
agent-browser snapshot -i --json # AI parses tree and refs2. AI identifies target refs from snapshot
3. Execute actions using refs
agent-browser fill @e3 "input text"4. Get new snapshot if page changed
Authenticated SessionsHeaders are scoped to api.example.com only
Requests to api.example.com include the auth header
agent-browser click @e2Navigate to another domain - headers are NOT sent (safe!)
Skipping login flows - Authenticate via headers instead of UI
Switching users - Start new sessions with different auth tokens
API testing - Access protected endpoints directly
Security - Headers are scoped to the origin, not leaked to other domains
agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'Serverless deployment: Use lightweight Chromium builds like @sparticuz/chromium (~50MB vs ~684MB)
System browsers: Use an existing Chrome/Chromium installation
Custom builds: Use modified browser builds
Via flag
Via environment variable
import { BrowserManager } from 'agent-browser';
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
// ... use browser
}
You Might Also Like
Related Skills

coding-agent
Run Codex CLI, Claude Code, OpenCode, or Pi Coding Agent via background process for programmatic control.
openclaw
add-uint-support
Add unsigned integer (uint) type support to PyTorch operators by updating AT_DISPATCH macros. Use when adding support for uint16, uint32, uint64 types to operators, kernels, or when user mentions enabling unsigned types, barebones unsigned types, or uint support.
pytorch
at-dispatch-v2
Convert PyTorch AT_DISPATCH macros to AT_DISPATCH_V2 format in ATen C++ code. Use when porting AT_DISPATCH_ALL_TYPES_AND*, AT_DISPATCH_FLOATING_TYPES*, or other dispatch macros to the new v2 API. For ATen kernel files, CUDA kernels, and native operator implementations.
pytorch
skill-writer
Guide users through creating Agent Skills for Claude Code. Use when the user wants to create, write, author, or design a new Skill, or needs help with SKILL.md files, frontmatter, or skill structure.
pytorch
implementing-jsc-classes-cpp
Implements JavaScript classes in C++ using JavaScriptCore. Use when creating new JS classes with C++ bindings, prototypes, or constructors.
oven-sh
implementing-jsc-classes-zig
Creates JavaScript classes using Bun's Zig bindings generator (.classes.ts). Use when implementing new JS APIs in Zig with JSC integration.
oven-sh