Browser (Chrome DevTools Protocol)

Drives a real Chrome/Chromium instance via CDP. Agents can navigate, click, fill, screenshot, and run JS — with stable element refs that work across DOM mutations within a single turn.

Source: crates/plugins/browser/.

Topics

DirectionSubjectNotes
Outboundplugin.outbound.browserTool invocations
Eventsplugin.events.browser.<method_suffix>Mirrored CDP notifications

Browser is an outbound-only plugin — there is no unsolicited inbound event from a web page to the agent.

Config

# config/plugins/browser.yaml
browser:
  headless: false
  executable: ""                     # empty → search PATH
  cdp_url: ""                        # empty → launch new Chrome
  user_data_dir: ./data/browser/profile
  window_width: 1280
  window_height: 800
  connect_timeout_ms: 10000
  command_timeout_ms: 15000
  args: []                           # extra CLI flags for Chrome
FieldDefaultPurpose
headlessfalseLaunch Chrome without a UI.
executable""Chrome binary path. Empty = search PATH.
cdp_url""Connect to an existing Chrome DevTools server (e.g. http://127.0.0.1:9222). Empty = launch a new instance.
user_data_dir./data/browser/profileChrome profile cache. Keeps cookies, logins.
window_width / window_height1280 / 800Viewport.
connect_timeout_ms10000How long to wait for Chrome startup / remote connect.
command_timeout_ms15000Per-CDP-command execution timeout.
args[]Extra CLI flags forwarded verbatim to the spawned Chrome. Ignored when cdp_url is set. Later args win — use this to override built-in flags when a restricted environment needs it (e.g. --no-sandbox on Termux).

Auth

None. CDP is an unauthenticated protocol — use cdp_url only with a loopback / firewalled Chrome.

Tools exposed to the LLM

ToolPurpose
browser_navigateLoad URL and wait for load event.
browser_clickClick by element ref (@e12) or CSS selector.
browser_fillType into input / textarea / contenteditable. Replaces content.
browser_screenshotBase64 PNG of the viewport.
browser_evaluateRun JS, return value as JSON.
browser_snapshotText DOM tree with stable element refs.
browser_scroll_toScroll a target element into view.
browser_current_urlCurrent page URL.
browser_wait_forPoll for an element to appear.
browser_go_back / browser_go_forwardNavigation history.
browser_press_keyKeyboard events.

All tools are prefixed browser_* for glob filtering in allowed_tools.

Element refs

browser_snapshot emits a text tree where every actionable element has a ref like @e12. Those refs are stable within the snapshot turn but invalidated by any subsequent DOM mutation:

sequenceDiagram
    participant A as Agent
    participant B as Browser plugin
    participant C as Chrome

    A->>B: browser_snapshot
    B->>C: DOM.describeNode(..)
    C-->>B: tree
    B-->>A: "Login @e12\nEmail @e13\n..."
    A->>B: browser_fill(@e13, "user@…")
    B->>C: DOM.focus + Input.dispatch
    A->>B: browser_click(@e12)
    Note over A,B: refs still valid<br/>(same snapshot turn)
    A->>B: browser_snapshot
    Note over B: refs from prior snapshot<br/>now INVALID

Rule: take a snapshot, act on refs from that snapshot, take a new snapshot before acting again.

Gotchas

  • browser_fill replaces content. No append mode. To add text to existing content, read the current value first (via evaluate) then send the merged string.
  • Connecting to an existing Chrome (cdp_url) skips the profile setup. Any login state is whatever that Chrome already has.
  • Element refs expire on DOM mutation. The plugin does not auto-refresh — refs from a stale snapshot will error or misfire.
  • Headless sites break. Some sites detect headless Chrome and behave differently. Use headless: false for those.