Automate Any Website with the Agent Browser Skill: CLI-First Browser Control for AI Agents
The web is everywhere—forms to fill, data to scrape, dashboards to monitor. But controlling a browser from your AI agent has traditionally meant wrestling with complex Playwright or Puppeteer setups. The Agent Browser skill changes that, giving your Clawdbot a fast, intuitive CLI for headless browser automation.
What is Agent Browser?
Agent Browser wraps Vercel Labs' agent-browser CLI, providing a Rust-based headless browser with Node.js fallback. It lets your AI agent navigate pages, click buttons, fill forms, take screenshots, and even record videos—all through simple shell commands.
Why you need this:
- Form automation: Fill and submit web forms programmatically
- Web scraping: Extract text, attributes, and state from any page
- Testing workflows: Verify UI behavior as part of your agent's tasks
- Visual documentation: Capture screenshots and video recordings
Installation
Install via ClawdHub:
clawdhub install agent-browserThe skill requires Node.js and npm. On first use, run the install command to set up the browser binaries:
npm install -g agent-browser
agent-browser install --with-depsThat's it—no separate Chromium download, no complex configuration.
Core Workflow: Snapshot → Interact → Repeat
Agent Browser uses a ref-based interaction model. Here's the pattern:
- Navigate to a page
- Snapshot to get element refs (like
@e1,@e2) - Interact using those refs
- Re-snapshot after navigation or DOM changes
This approach keeps interactions reliable and predictable.
Usage Examples
Example 1: Basic Navigation and Inspection
# Open a page
agent-browser open https://example.com
# Get interactive elements with their refs
agent-browser snapshot -i
# Output: textbox "Search" [ref=e1], button "Submit" [ref=e2], link "About" [ref=e3]
# Get the current page title and URL
agent-browser get title
agent-browser get urlExample 2: Form Submission
# Navigate to a form
agent-browser open https://httpbin.org/forms/post
# Snapshot interactive elements
agent-browser snapshot -i
# Fill form fields (use refs from snapshot output)
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
# Submit
agent-browser click @e3
# Wait for the page to load and verify
agent-browser wait --load networkidle
agent-browser snapshot -iExample 3: Screenshots and Video Recording
Perfect for documenting workflows or debugging:
# Take a screenshot
agent-browser screenshot output.png
# Full-page screenshot
agent-browser screenshot --full fullpage.png
# Record a video
agent-browser record start ./demo.webm
agent-browser click @e1
agent-browser fill @e2 "Hello World"
agent-browser click @e3
agent-browser record stopExample 4: Saving Authentication State
Login once, reuse the session across runs:
# Perform login
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "/dashboard"
# Save the authenticated state
agent-browser state save auth.json
# Later: load the saved state and skip login
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboardPro Tips
Use -i for interactive snapshots: agent-browser snapshot -i filters to only interactive elements, making output much cleaner than the full accessibility tree.
Use fill not type: The fill command clears existing text before typing; type appends. For form fields, fill is almost always what you want.
Debug with --headed: Can't figure out what's happening? Add --headed to see the browser window:
agent-browser open example.com --headedChain waits intelligently: After clicking a submit button, wait for the network to settle:
agent-browser click @e3
agent-browser wait --load networkidleUse semantic locators as a fallback: When refs aren't stable, find elements by role or text:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" clickWhen to Use Agent Browser vs. Built-in Browser Tool
Clawdbot has a built-in browser tool that's great for quick page snapshots and authenticated sessions. Agent Browser complements this when you need:
- Video recording of workflows
- Complex multi-step form automation
- Network interception and mocking
- Parallel browser sessions
- State persistence across runs
Conclusion
Agent Browser brings CLI-first browser automation to your AI agent. With its ref-based interaction model and comprehensive command set, you can automate virtually any web workflow—from simple form fills to complex multi-page flows with authentication persistence.
Links:
Happy automating! 🤖🌐
Comments (0)
No comments yet. Be the first to comment!