Run and read the report

Once the test exists, run it and see whether your app behaves the way you described. This step is handled by the /run-suite skill. The heart of the skill is that it re-groups the result into flow language instead of handing you a tangle of logs.

1) Run the tests

Type this into the Claude Code prompt.

run the tests

Or /run-suite. If you use more than one track, you can target a single track with something like /run-suite web.

The skill first reads e2e-suite-config.json to decide which track to run, then runs that track’s tests. On web, Playwright launches an actual browser; on mobile, Maestro launches a simulator — and steps through the flow one move at a time.

If it reports nothing to run: a “run /4-design-suite first” or “/5-implement-suite” prompt means the design or implementation is not finished. Head back to Your first test and complete that step first.

2) The report comes in flow language

A test tool’s raw output is long and hard to read. /run-suite does not throw it at you as is — it re-groups it by the flows you wrote in Choose your target.

Flows that passed get one short line; only failed flows get detail. The shape is close to this.

## Behavior check — web

✓ Logged-out visitor clicks Pro checkout on the pricing page → enters dashboard
✓ Logged-in user edits profile in settings
✗ Logged-in user opens billing management (Customer Portal)
   - Location: tests/web/specs/billing.spec.ts:42
   - Message: customer-portal call returned 401
   - Likely cause: missing Authorization header, or expired session
   - Next: re-run implement-web-suite — review the session-extraction part

A line with a check mark means that flow behaved the way you intended. A line with an X mark is a failed flow, and four things follow underneath it.

Location: which test file and which line it stopped at
Message: what went wrong
Likely cause: the AI’s guess at why
Next: what to call next

Thanks to those four lines, the next step is visible without you having to interpret a log. When a failure shows up, you usually call the skill named on the “Next” line.

3) Pass, fail, and flaky

It pays to keep three outcomes apart.

All pass: your app behaves the way the flows intended. Reach this seat and your first test is alive as a safety net.

Fail: somewhere in the flow diverges from intent. It is one of two things. Either there is a real bug in your app, or the test is pointing at the wrong thing on screen. The report’s “Likely cause” tells you which side it is. If the test is mis-pointing, call the implement-* skill on the “Next” line again to fix it; if it is a real bug, you have to touch your app.

flaky: the same test passes on one run and fails on the next. The report flags this separately. Flakiness usually comes from the test acting before the screen has fully painted. This seat is a stability problem, so it moves to /review-e2e for inspection. Fuller symptoms are in the flaky entry of Troubleshooting.

4) When you need the detailed log

If the on-screen summary is not enough and you want the real raw output, the whole thing is saved to .claude/state/last-run.log. The full output of the last run lands there. Usually you have no reason to open it, but it comes in handy when you tell the AI “read last-run.log and look closer at this failure’s cause.”

A copy of the full report also stays in .claude/state/run-{date}.md. A new file is created on every run, so you can compare against past runs.

5) If it is production, the run is refused outright

If the address or device the tests try to reach is production, /run-suite refuses to run. That is the safety boundary against accidentally running tests against a live environment. If this refusal shows up, head back to Choose your target and switch the environment to local or staging.

Once you have watched the first flow pass, it pays to learn what safety net the base runs behind the scenes. Head to The automated safety net.