/review skill

GStack AI Code Review: A Paranoid Staff Engineer on Every PR

Structural audit that catches the bugs your test suite misses. N+1 queries, race conditions, stale reads, broken trust boundaries, and more.

Why CI Passing Doesn't Mean Production-Safe

Your CI pipeline is green. Every test passes. The linter is happy. You merge, deploy, and fifteen minutes later an on-call engineer is staring at a cascade of database locks caused by an N+1 query your test suite never exercised at scale. This is the class of problem that gstack ai code review was designed to eliminate.

The /review skill in GStack acts as a paranoid staff engineer. It does not care about code style, variable naming, or whether you used single quotes or double quotes. It performs a structural audit of your diff against main, looking exclusively for bugs that will pass CI and break in production.

This is not a linter. It is not a style checker. It is the reviewer who asks "what happens when two requests hit this endpoint at the same time?" and "does this LLM output get sanitized before it reaches the database?"

What /review Looks For

The review runs in two passes. The first pass covers critical issues that block shipping. The second covers informational findings that belong in the PR description but do not block the merge.

Pass 1: Critical (Blocks /ship)

Critical

SQL and Data Safety

String interpolation in queries, TOCTOU races in check-then-set patterns, update_column bypassing validations, and N+1 queries with missing .includes() on associations used in loops.

Critical

Race Conditions

Read-check-write without uniqueness constraints, find_or_create_by on columns without unique DB indexes, non-atomic status transitions, and html_safe on user-controlled data.

Critical

LLM Trust Boundaries

LLM-generated values written to DB without format validation. Structured tool output accepted without type or shape checks before persistence.

Critical

Enum and Value Completeness

New enum values, status strings, or tier names traced through every consumer. Case statements, allowlists, and filter arrays checked for coverage of the new value.

Pass 2: Informational (Non-blocking)

Info

Conditional Side Effects

Code paths that branch on a condition but forget to apply a side effect on one branch, creating inconsistent records.

Info

Test Gaps

Negative-path tests that assert status but not side effects. Security enforcement without integration tests covering the enforcement path end-to-end.

Info

Bad Retry Logic

Missing indexes on queried columns, non-constant-time comparisons on secrets, and truncation of data instead of hashing for deduplication.

Info

Escaping and Type Coercion

Values crossing Ruby-to-JSON-to-JS boundaries where type could change. Hash or digest inputs that skip .to_s before serialization.

Structural audit, not style nitpicking

GStack's review checklist includes an explicit suppressions list. It will never flag harmless redundancy, suggest adding comments to magic numbers during tuning, or recommend consistency-only changes. If it flags something, it is a real structural concern.

How GStack AI Code Review Works

Running a review is a single command. Type /review in any Claude Code session while on a feature branch. The skill handles everything from there.

1

Branch Detection

Checks if you are on a feature branch with changes against main. If you are on main or there is no diff, it stops immediately with a clear message.

2

Diff Extraction

Fetches the latest origin/main and runs git diff origin/main to get the full diff, including both committed and uncommitted changes.

3

Two-Pass Checklist Review

Applies the full checklist against the diff. Pass 1 covers critical categories. Pass 2 covers informational categories. For enum completeness, it reads code outside the diff to trace all consumers.

4

Greptile Triage (if applicable)

Reads any Greptile PR comments, classifies each one, and integrates valid findings into the critical issues list.

5

Interactive Resolution

For each critical finding, presents the issue and gives you three options: fix it now, acknowledge and proceed, or mark as false positive.

Greptile-Aware Triage

If you use Greptile for automated code review, the /review skill integrates with it directly. When a PR has Greptile comments, GStack fetches both line-level and top-level review comments, then classifies each one into four categories.

Classification System

  • Valid and Actionable -- A real bug or structural issue that exists in the current code. Gets added to the critical findings and follows the same fix/acknowledge/skip flow.
  • Valid but Already Fixed -- A real issue that was addressed in a subsequent commit on the branch. GStack auto-replies to the Greptile comment with the fixing commit SHA and explanation.
  • False Positive -- The comment misunderstands the code, flags something handled elsewhere, or is stylistic noise. GStack pushes back with concrete evidence and a suggested severity re-rank.
  • Suppressed -- A known false positive pattern from previous triage runs. Skipped silently.

Tiered Reply System

Every reply to Greptile includes concrete evidence. No vague responses. GStack uses a tiered approach:

  • Tier 1 (First response) -- Friendly, evidence-included. For fixes, includes an inline diff and explanation. For false positives, includes specific code references and a suggested re-rank.
  • Tier 2 (Re-flagged after prior reply) -- Firm, with overwhelming evidence. Includes the full relevant diff, an evidence chain with file permalinks and commit SHAs, and a clear request to recalibrate severity.

The escalation detection works automatically. Before composing any reply, GStack checks if a prior reply already exists on the comment thread. If Greptile re-flags the same issue after a previous GStack response, the reply template escalates to Tier 2.

Learning from False Positives

Every time GStack encounters a false positive from Greptile, the pattern gets saved to ~/.gstack/greptile-history.md in a structured format that includes the date, repository, classification type, file pattern, and issue category.

# Example entries in greptile-history.md 2026-03-13 | garrytan/myapp | fp | app/services/auth_service.rb | race-condition 2026-03-13 | garrytan/myapp | fix | app/models/user.rb | null-check 2026-03-14 | garrytan/myapp | already-fixed | lib/payments.rb | error-handling

GStack maintains both a per-project history and a global aggregate. The per-project history is used for suppressions. On future runs, any Greptile comment that matches a known false positive pattern (same repo, same file pattern, same issue category) gets automatically suppressed. This means the system gets smarter over time. The more you use it, the fewer false positives you have to deal with.

The global history feeds into the /retro skill for tracking patterns across all your projects.

History files are append-only and fault-tolerant

Malformed lines in the history file are skipped silently. The triage never fails because of a corrupted or hand-edited history file. You can safely inspect and edit these files yourself.

What Makes This Different from Other AI Code Review Tools

Most AI code review tools operate at the surface level. They flag unused variables, suggest better naming, and point out formatting inconsistencies. These are problems your linter already solves. GStack's /review skill targets a fundamentally different class of bugs.

The "Tests That Miss Real Failure Modes" Problem

Consider a test that asserts a record is created with the correct status. The test passes. But the test never checks whether the associated URL was attached, whether the callback fired, or whether the side effect on the other branch of the conditional was applied. GStack's review checklist specifically looks for these gaps: tests that verify type and status but skip the side effects that matter in production.

Beyond the Diff

When your diff introduces a new enum value, checking only the diff is insufficient. GStack uses Grep to find every file that references sibling values, then reads each one to verify the new value is handled. It traces through case statements, allowlists, filter arrays, and display logic outside the diff. This is the kind of review that requires reading the codebase, not just the PR.

Integration with the Shipping Workflow

The /review skill is designed to pair with GStack's /ship workflow. Critical findings from /review block the /ship command, ensuring that structural issues are resolved before code reaches production. Informational findings are included in the PR body for context but do not gate the merge. This creates a natural checkpoint in the shipping workflow without slowing down development on non-critical items.

The Review Checklist in Detail

The checklist is maintained as a structured markdown file within GStack. Each category has specific patterns it looks for, and each pattern is tied to a real-world failure mode. Here is a deeper look at what the critical categories cover.

SQL and Data Safety

Even values converted with .to_i or .to_f are flagged if they use string interpolation in SQL. The checklist requires sanitize_sql_array or Arel. TOCTOU (time-of-check to time-of-use) races in check-then-set patterns are caught with the recommendation to use atomic WHERE plus update_all. And update_column/update_columns calls that bypass validations on constrained fields are flagged as potential data corruption vectors.

Race Conditions and Concurrency

The review looks for read-check-write patterns that lack uniqueness constraints or rescue RecordNotUnique; retry handling. It catches find_or_create_by on columns without unique database indexes, where concurrent calls can create duplicates. Status transitions that are not atomic -- where concurrent updates can skip or double-apply a transition -- are flagged with the recommendation for an atomic WHERE old_status UPDATE SET new_status pattern.

LLM Output Trust Boundary

As more applications integrate LLM outputs, the trust boundary between generated content and your database becomes critical. GStack flags LLM-generated emails, URLs, and names that are written to the database without format validation. It looks for missing guards like EMAIL_REGEXP, URI.parse, or .strip before persistence. Structured tool output (arrays, hashes) accepted without type or shape checks before database writes are flagged as well.

Running /review in Practice

The typical workflow integrates /review into your plan-to-ship pipeline. After implementing a feature on a branch, you run /review before calling /ship. The skill reads the full diff, applies both passes of the checklist, triages any Greptile comments, and presents findings interactively.

# In any Claude Code session on a feature branch /review # Output example: Pre-Landing Review: 3 issues (1 critical, 2 informational) + 4 Greptile comments (1 valid, 2 fixed, 1 FP) CRITICAL (blocking /ship): - [app/models/user.rb:47] find_or_create_by on email without unique DB index Fix: add unique index on users.email, add rescue RecordNotUnique Issues (non-blocking): - [app/views/dashboard.html.erb:23] O(n*m) lookup: Array#find in loop Fix: use index_by hash before the loop - [test/models/user_test.rb:91] Asserts status but not side effects Fix: add assertion for confirmation_email_sent callback

For each critical issue, you are asked to choose: fix it now, acknowledge and move on, or mark it as a false positive. If you choose to fix, GStack applies the recommended change directly. The review is read-only by default and never commits, pushes, or creates PRs unless you explicitly opt into a fix.

Getting Started with GStack AI Code Review

The /review skill is included in every GStack installation. Follow the setup and install guide to get GStack running, then type /review on any feature branch. No configuration required. Greptile integration activates automatically when Greptile comments are detected on the PR.

For the full picture of how /review fits into the broader workflow, see the complete skills overview and the shipping workflow guide.