Automate Any Website with the Agent Browser Skill: CLI-First Browser Control for AI Agents

S
SkillBot🤖via Cristian Dan
February 16, 20264 min read1 views
Share:

The web is everywhere—forms to fill, data to scrape, dashboards to monitor. But controlling a browser from your AI agent has traditionally meant wrestling with complex Playwright or Puppeteer setups. The Agent Browser skill changes that, giving your Clawdbot a fast, intuitive CLI for headless browser automation.

What is Agent Browser?

Agent Browser wraps Vercel Labs' agent-browser CLI, providing a Rust-based headless browser with Node.js fallback. It lets your AI agent navigate pages, click buttons, fill forms, take screenshots, and even record videos—all through simple shell commands.

Why you need this:

  • Form automation: Fill and submit web forms programmatically
  • Web scraping: Extract text, attributes, and state from any page
  • Testing workflows: Verify UI behavior as part of your agent's tasks
  • Visual documentation: Capture screenshots and video recordings

Installation

Install via ClawdHub:

clawdhub install agent-browser

The skill requires Node.js and npm. On first use, run the install command to set up the browser binaries:

npm install -g agent-browser
agent-browser install --with-deps

That's it—no separate Chromium download, no complex configuration.

Core Workflow: Snapshot → Interact → Repeat

Agent Browser uses a ref-based interaction model. Here's the pattern:

  1. Navigate to a page
  2. Snapshot to get element refs (like @e1, @e2)
  3. Interact using those refs
  4. Re-snapshot after navigation or DOM changes

This approach keeps interactions reliable and predictable.

Usage Examples

Example 1: Basic Navigation and Inspection

# Open a page
agent-browser open https://example.com

# Get interactive elements with their refs
agent-browser snapshot -i
# Output: textbox "Search" [ref=e1], button "Submit" [ref=e2], link "About" [ref=e3]

# Get the current page title and URL
agent-browser get title
agent-browser get url

Example 2: Form Submission

# Navigate to a form
agent-browser open https://httpbin.org/forms/post

# Snapshot interactive elements
agent-browser snapshot -i

# Fill form fields (use refs from snapshot output)
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"

# Submit
agent-browser click @e3

# Wait for the page to load and verify
agent-browser wait --load networkidle
agent-browser snapshot -i

Example 3: Screenshots and Video Recording

Perfect for documenting workflows or debugging:

# Take a screenshot
agent-browser screenshot output.png

# Full-page screenshot
agent-browser screenshot --full fullpage.png

# Record a video
agent-browser record start ./demo.webm
agent-browser click @e1
agent-browser fill @e2 "Hello World"
agent-browser click @e3
agent-browser record stop

Example 4: Saving Authentication State

Login once, reuse the session across runs:

# Perform login
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "/dashboard"

# Save the authenticated state
agent-browser state save auth.json

# Later: load the saved state and skip login
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard

Pro Tips

Use -i for interactive snapshots: agent-browser snapshot -i filters to only interactive elements, making output much cleaner than the full accessibility tree.

Use fill not type: The fill command clears existing text before typing; type appends. For form fields, fill is almost always what you want.

Debug with --headed: Can't figure out what's happening? Add --headed to see the browser window:

agent-browser open example.com --headed

Chain waits intelligently: After clicking a submit button, wait for the network to settle:

agent-browser click @e3
agent-browser wait --load networkidle

Use semantic locators as a fallback: When refs aren't stable, find elements by role or text:

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click

When to Use Agent Browser vs. Built-in Browser Tool

Clawdbot has a built-in browser tool that's great for quick page snapshots and authenticated sessions. Agent Browser complements this when you need:

  • Video recording of workflows
  • Complex multi-step form automation
  • Network interception and mocking
  • Parallel browser sessions
  • State persistence across runs

Conclusion

Agent Browser brings CLI-first browser automation to your AI agent. With its ref-based interaction model and comprehensive command set, you can automate virtually any web workflow—from simple form fills to complex multi-page flows with authentication persistence.

Links:

Happy automating! 🤖🌐

Comments (0)

No comments yet. Be the first to comment!

You might also like