gstack Browser Automation Testing: Give Claude Code Eyes

A persistent headless Chromium browser that lets Claude Code see, click, fill forms, and test your web application -- all from the terminal.

What Is gstack Browser Automation Testing?

gstack browser automation testing is powered by the /browse skill, a compiled Bun binary that manages a persistent headless Chromium browser daemon. It gives Claude Code the ability to visually inspect web pages, interact with form elements, verify UI states, and run end-to-end tests -- all without leaving your terminal workflow. Instead of writing test scripts by hand, you describe what you want to test in natural language and let the AI navigate your application like a real user would.

Under the hood, /browse communicates with a Chromium instance via localhost HTTP. The browser is built on Playwright by Microsoft, one of the most reliable browser automation frameworks available. The first invocation starts the browser daemon in roughly 3 seconds, and subsequent commands execute in approximately 100-200 milliseconds, making the feedback loop fast enough for iterative development.

Unlike disposable browser sessions in typical testing frameworks, gstack browser automation testing maintains persistent state. Cookies, open tabs, localStorage, and session data all carry over between commands. This means you can log in once and then run dozens of test scenarios without re-authenticating. If you need to test authenticated flows with imported cookies, see the dedicated cookie import for authenticated testing guide.

Why Persistent State Matters

Traditional browser automation tools spin up a fresh browser context for every test. gstack's persistent Chromium daemon keeps your session alive, which means faster test runs, realistic multi-step user journeys, and the ability to debug issues interactively without losing your place.

How the /browse Architecture Works

The /browse skill is architecturally distinct from typical browser automation tools. Here is how the pieces fit together:

  1. Compiled Bun binary -- The skill ships as a single compiled binary, eliminating Node.js dependency management headaches. It launches fast and runs lean.
  2. Persistent Chromium daemon -- On first invocation, the binary starts a headless Chromium instance that listens on localhost. This daemon stays alive across commands.
  3. Localhost HTTP communication -- Every command (navigate, click, screenshot) is sent as an HTTP request to the local daemon. The response comes back as structured data that Claude Code can reason about.
  4. Playwright engine -- Under the hood, Microsoft's Playwright handles the actual browser automation. This gives you the reliability and cross-browser compatibility that Playwright is known for.
  5. Auto-shutdown -- The browser daemon automatically shuts down after 30 minutes of idle time, freeing up system resources without requiring manual cleanup.
  6. Workspace isolation -- Each Conductor workspace gets its own isolated browser instance, so parallel workstreams never interfere with each other. Learn more about this in the parallel AI coding guide.

The @ref System: Why CSS Selectors Are Obsolete

One of the most distinctive features of gstack browser automation testing is the @ref system. Instead of relying on brittle CSS selectors or XPath expressions, the /browse skill reads the page's accessibility tree and assigns short reference identifiers like @e1, @e2, @e3 to every interactive element.

This approach has several advantages over traditional selector-based automation:

  • Resilience to UI changes -- CSS selectors break when class names, IDs, or DOM structure change. The @ref system targets elements by their accessibility role and position, which are far more stable.
  • Natural language friendly -- Claude Code can refer to "the submit button at @e5" instead of constructing document.querySelector('.form-container > button.submit-btn').
  • Accessibility-first -- Because the system reads the accessibility tree, it inherently verifies that your UI elements are accessible. If an element is not in the accessibility tree, it likely has accessibility issues.
  • Context-aware -- References update with each snapshot, always reflecting the current state of the page.
# Take a snapshot and see all interactive elements with @refs
/browse snapshot -i

# Click a specific element by its ref
/browse click @e3

# Fill a form field identified by ref
/browse fill @e7 "test@example.com"

Complete Command Reference

The /browse skill exposes a focused set of commands that cover the full range of browser interactions. Here is every command available for gstack browser automation testing:

Command Purpose Example
goto Navigate to a URL /browse goto https://myapp.com
snapshot Capture the current page state (accessibility tree) /browse snapshot -i
click Click an element by @ref /browse click @e4
fill Type text into a form field /browse fill @e2 "hello"
screenshot Capture a visual screenshot of the page /browse screenshot
text Extract all visible text from the page /browse text
console Read browser console logs /browse console
network Inspect network requests and responses /browse network

Snapshot Flags Explained

The snapshot command is the most frequently used command in gstack browser automation testing. It captures the page state so that Claude Code can understand what is on screen. Several flags modify its behavior:

-i (Interactive)

Filters the snapshot to show only interactive elements like buttons, links, and form fields. Perfect for understanding what actions are available.

-D (Diff)

Shows only what changed since the last snapshot. Essential for verifying that a click or form submission produced the expected result.

-a (Annotated Screenshot)

Captures a visual screenshot with @ref labels overlaid on each element. Helps Claude Code map visual layout to interactive references.

-C (Cursor-Interactive)

Enables cursor-based interaction mode for elements that require hover states, drag-and-drop, or other pointer-specific behaviors.

Real-World Testing Workflows

gstack browser automation testing shines in workflows where you need to verify actual browser behavior rather than mocking it. Here are the most common use cases:

End-to-End Form Validation

Testing form validation by actually filling and submitting forms catches issues that unit tests miss -- like client-side validation that only fires on blur, or server-side errors that render incorrectly. With /browse, Claude Code can fill every field, submit, and verify both success and error states in seconds.

# Navigate to the signup page
/browse goto http://localhost:3000/signup

# Take interactive snapshot to see form fields
/browse snapshot -i

# Fill in the registration form
/browse fill @e3 "Jane Doe"
/browse fill @e5 "jane@example.com"
/browse fill @e7 "securePass123"

# Submit and verify
/browse click @e9
/browse snapshot -D

Visual Regression Checking

By combining screenshot with Claude Code's vision capabilities, you can have the AI visually inspect your pages for layout issues, broken styles, or missing elements. This is especially valuable after CSS refactors or component library upgrades.

Multi-Page User Journeys

Because the browser maintains persistent state, you can script complex user journeys that span multiple pages: add an item to cart, proceed to checkout, fill shipping details, confirm the order, and verify the confirmation page. The session state persists throughout, just like a real user's browser.

API Response Verification

The network command lets you inspect actual API calls made by the frontend. You can verify that the correct endpoints are called, that request payloads match expectations, and that the UI correctly reflects API responses. Combined with the console command for catching JavaScript errors, this provides full observability into client-side behavior.

For structured QA workflows that build on these capabilities, see the automated QA testing guide, which covers how to orchestrate comprehensive test suites using the /browse skill as a foundation.

Performance Characteristics

Speed matters in development workflows. Here is what to expect from gstack browser automation testing:

Startup and Response Times

  • Cold start: ~3 seconds to launch the Chromium daemon on first invocation
  • Subsequent commands: ~100-200ms per command, including network round-trip to the local daemon
  • Screenshot capture: Slightly longer due to rendering and image encoding, but still under 500ms for most pages
  • Auto-shutdown: Browser daemon closes after 30 minutes of idle time

This performance profile makes /browse practical for interactive use. You can ask Claude Code to "check if the login page works" and get results in under a second, rather than waiting for a full test suite to spin up, run, and tear down.

Workspace Isolation and Parallel Testing

Each Conductor workspace gets its own isolated browser instance. This means you can run gstack browser automation testing across multiple workstreams simultaneously without session conflicts, cookie collisions, or tab interference.

Consider a scenario where you are developing two features in parallel. Feature A requires testing a checkout flow while Feature B involves redesigning the dashboard. Each workspace's /browse instance maintains completely separate state, so there is no risk of one test run corrupting another. This isolation is a natural extension of gstack's parallel AI coding architecture.

Integration with Other gstack Skills

gstack browser automation testing becomes even more powerful when combined with other skills in the gstack ecosystem:

  • Automated QA Testing -- Use /browse as the execution engine for structured QA test plans. The QA skill orchestrates what to test, while /browse handles the actual browser interaction.
  • Cookie Import -- Import session cookies from your real browser to test authenticated flows without storing passwords. The persistent browser state in /browse keeps those cookies alive across all subsequent commands.
  • AI Code Review -- After making changes, use /browse to visually verify the impact before submitting code for review.
  • Shipping Workflow -- Include browser-based smoke tests as part of your pre-ship checklist to catch issues before they reach production.

Getting Started with /browse

If you have gstack installed and configured, the /browse skill is available immediately. There is no separate installation step -- the compiled Bun binary and Chromium dependency are bundled with the gstack distribution.

To verify that everything is working, run a simple navigation test:

# Start by navigating to any URL
/browse goto https://example.com

# Take a snapshot to see the page content
/browse snapshot

# Verify interactive elements are detected
/browse snapshot -i

If the first command returns page content within a few seconds, your gstack browser automation testing setup is ready. From here, you can begin testing your own application by pointing goto at your local development server.

Tip: Local Development Testing

The most common use case is testing against http://localhost:3000 (or whichever port your dev server uses). Because the browser daemon runs on the same machine, there is zero network latency between the browser and your application. This makes gstack browser automation testing significantly faster than cloud-based testing services for development-time verification.

When to Use Browser Automation vs. Unit Tests

gstack browser automation testing is not a replacement for unit tests. It fills a different niche. Use /browse when you need to verify:

  • Visual layout and styling render correctly in an actual browser
  • Client-side JavaScript executes without errors
  • Form submissions work end-to-end, including validation
  • Multi-page flows maintain state correctly
  • Network requests are sent with the correct payloads
  • Responsive design works at different viewport sizes
  • Third-party integrations (analytics, chat widgets, auth providers) load correctly

Continue using unit tests for pure logic, data transformations, and component behavior in isolation. The combination of fast unit tests and targeted browser automation gives you high confidence with minimal overhead.