QA Automation

Gstack Automated QA Testing with /qa and /qa-only

Run systematic QA testing inside Claude Code. Find bugs, get health scores, and ship with confidence using gstack's automated QA skills.

Manual QA is slow, inconsistent, and the first thing teams skip under deadline pressure. Gstack automated QA testing replaces that entire process with two Claude Code skills: /qa and /qa-only. They use browser automation to systematically crawl your application, detect bugs across severity tiers, and deliver a quantified health score — all from the terminal.

Whether you need a quick pre-push sanity check or a full regression suite, gstack's QA skills adapt to the context. On feature branches, they automatically read your diff to test only what changed. On main, they do a thorough sweep. And the best part: /qa doesn't just find problems — it fixes them with atomic commits and re-verifies that the fix actually works.

Two Skills, Two Philosophies

Gstack gives you a choice between active QA and passive reporting, depending on where you are in your workflow.

The /qa Skill: Find, Fix, and Verify

The /qa skill is your full-cycle automated QA engineer. It tests your application, identifies bugs across three severity tiers, then fixes each issue with an atomic commit and re-runs verification to confirm the fix works. When it finishes, you get a before-and-after health score and a ship-readiness summary that tells you exactly where things stand.

This is the skill you reach for when you want to hand off QA entirely. It uses the /browse skill under the hood for all browser automation — launching headless browsers, navigating pages, taking screenshots, and interacting with forms and UI elements.

Example Health Score Output

62
Before
94
After

Ship-readiness: Ready to merge. 3 critical bugs fixed, 1 medium issue documented.

The /qa-only Skill: Pure Bug Reports

/qa-only uses the exact same testing methodology and severity classification as /qa, but it never modifies your code. It produces a comprehensive bug report and nothing else. Use this when you want visibility into the current state of your application without any automated intervention — for instance, before a code review, or when auditing a branch someone else owns.

Reports from both skills are saved to .gstack/qa-reports/ in your project directory, giving you a historical record you can reference during engineering retrospectives or plan reviews.

Four Testing Modes

Gstack automated QA testing supports four distinct modes, each designed for a specific stage of your development workflow. You don't need to pick manually in most cases — the skill infers the right mode from your git context.

Auto-detects

Diff-Aware Mode

Activates automatically on feature branches. Reads git diff main, identifies affected pages and components, then targets testing specifically at what changed.

5 – 15 min

Full Mode

Systematic exploration of your entire application. Crawls every reachable page, tests forms, checks responsive layouts, and validates core user flows end to end.

~30 seconds

Quick Mode

A 30-second smoke test that hits your critical paths. Perfect for pre-commit checks or verifying that a deploy didn't break anything obvious.

Baseline

Regression Mode

Diffs current test results against a saved baseline. Flags new issues introduced since the last known-good state. Ideal for CI integration and release gates.

Diff-Aware Mode: Intelligent Scoping

Diff-aware mode is what makes gstack automated QA testing practical for daily feature work. When you run /qa on a feature branch, it automatically executes git diff main and parses the output. It identifies which files changed, maps those changes to routes and components, and builds a targeted test plan.

If your diff touched a checkout form component, it tests the checkout flow. If you modified an API endpoint, it exercises the UI that depends on it. This means your QA cycles are proportional to your change size rather than your app size — a diff that touches two files doesn't need a 15-minute full crawl.

Tip: Diff-aware mode pairs well with the /ship workflow. Run /qa on your feature branch before shipping to catch regressions against main without scanning pages your changes didn't affect.

Health Score System

Every QA run produces a health score between 0 and 100. This isn't a subjective rating — it's computed from the number and severity of detected bugs, weighted against the scope of pages tested. The score gives you a single number to communicate app quality to stakeholders, track quality trends over time, and make data-driven ship/no-ship decisions.

When using /qa (the fix-and-verify skill), you get two scores: a before score reflecting the initial state and an after score reflecting the state post-fixes. The delta between these two numbers tells you exactly how much the automated fixes improved things.

Bug Severity Tiers

Gstack classifies every detected bug into one of three severity tiers. This classification drives the health score calculation and determines the priority order for automated fixes in /qa mode.

Severity Criteria Examples
Critical Blocks core user flows or causes data loss Broken checkout, login failure, JS crash on load, form submission silently failing
High Major functionality degraded but workarounds exist Search returning wrong results, pagination broken, mobile layout unusable
Medium Visual or minor functional issues Misaligned elements, console warnings, slow transitions, missing hover states

The /qa skill prioritizes critical bugs first, ensuring the most impactful fixes ship before it moves on to lower-severity items. Each fix is a standalone atomic commit, so you can cherry-pick or revert individual fixes if needed.

Testing Authenticated Pages

Real applications have login-protected routes, and testing only public pages gives you partial coverage at best. Gstack solves this with the /setup-browser-cookies skill, which imports session cookies from your real browser into the headless testing environment.

Once cookies are configured, /qa and /qa-only can access dashboards, admin panels, account settings, and any other authenticated page exactly as your logged-in user would see them. No mock auth. No test accounts. Real session state.

# Import cookies from your browser, then run QA
/setup-browser-cookies
/qa

This is particularly powerful for diff-aware testing on feature branches. If your diff touched an admin-only component, the QA skill can navigate to that page, verify the change works, and test for regressions — all with valid authentication.

Working with Dev Servers and Remote URLs

Gstack automated QA testing works with both localhost dev servers and remote URLs. For local development, point it at your running dev server. For staging or production audits, give it the remote URL. The testing methodology adapts to either environment.

# Test against local dev server
/qa http://localhost:3000

# Test a staging deployment
/qa https://staging.yourapp.com

# Diff-aware testing (auto-detects URL from project config)
/qa

When no URL is provided, the skill checks your project configuration and running processes to find the right target. If you're running a dev server on a common port, it will find it automatically.

QA Reports and Output

Every test run generates a structured report saved to .gstack/qa-reports/. These reports include the health score, a categorized bug list with severity tags, screenshots of detected issues, the testing mode used, and timestamps. For /qa runs, the report also includes a log of every fix applied with its corresponding commit hash.

This report archive serves multiple purposes. During code reviews, reviewers can check whether QA was run and what it found. During retrospectives, teams can track quality trends across sprints. And for compliance-sensitive projects, it provides an audit trail of what was tested and when.

Integrating QA Into Your Workflow

The most effective pattern is to incorporate gstack automated QA testing at two points in your development cycle:

  1. During development — Run /qa on your feature branch before requesting review. Diff-aware mode keeps it fast, and atomic fixes mean you get clean commits that are easy to review.
  2. Before shipping — Run /qa-only as a final gate before merging. This gives you a clean report without any last-minute code changes, ensuring what you reviewed is what gets merged.

For teams using the full gstack skills suite, a typical feature cycle looks like this: /plan-ceo-review to scope, build the feature, /qa to test and fix, /review for code review, then /ship to merge and deploy. QA becomes a natural checkpoint rather than a separate phase.

Quick vs. Full: Choosing the Right Mode

Quick mode (~30 seconds) is designed for high-frequency use. Run it after every meaningful code change to catch obvious regressions. It focuses on your app's critical paths — the pages and flows that matter most to users.

Full mode (5 to 15 minutes) is your comprehensive sweep. It systematically crawls your application, tests edge cases, checks responsive behavior, and exercises less-trafficked pages. Use it before major releases, after large refactors, or as a weekly quality audit.

Regression mode sits between the two in terms of thoroughness. It only checks things that have changed relative to a baseline, making it efficient for CI pipelines where you need to validate that nothing broke without running the full suite every time.

Getting started: Install gstack with the setup guide, start your dev server, and run /qa. Diff-aware mode activates automatically if you're on a feature branch. For a no-commitment first look, try /qa-only — it won't touch your code.