Analyze and Design

Before writing tests, two questions need answers. What will you verify, and how will you verify it. The first goes to /3-analyze-target, the second to /4-design-suite. The output of both phases lands in docs/design/e2e-spec.md and .claude/state/e2e-suite-config.json, and /5-implement-suite runs on top of that.

/3-analyze-target — what to verify

It identifies the SUT and draws out the critical flows, pinning them into §1 and §2 of e2e-spec.md. There are four things it asks.

Name and a one-line description. What does the app do.
Target type. One or more of web, electron, mobile.
Approach. A deployed URL, a local dev server command (npm run dev → :5173), a built binary path, or an Expo/RN/native build.
Test environment. One of local, staging, or dedicated test. Production is refused and steered to staging (S6).

If the pace (user-pace.json) is novice, it asks one at a time; if experienced, it groups them.

A critical flow is “a user flow that, if it breaks, the business stops.” Draw out three to five, but rank them, and mark the one or two to automate first as smoke candidates. Each flow is written as one line: “who, doing what, gets what result.” That one line later becomes one test. Not over-growing the flows matters. Catch the fatal ones first and leave the rest as spec §9 undecided.

Credential values are not collected here (S2). The user puts real keys into .env.test.local directly.

/4-design-suite — how to verify

It settles three seats: per-target tools, the auth and environment strategy, and the CI strategy.

Per-target tools

It picks a tool for each target type from §1. The recommended defaults are web=Playwright, electron=Playwright _electron, mobile=Maestro. mobile splits once more by app type. Expo goes Maestro, bare RN goes Detox or Maestro, native goes XCUITest or Espresso, Capacitor goes Ionic E2E. The per-tool detail lives in tracks.

Auth and environment strategy

It settles the auth-reuse mechanism. web uses storageState, mobile a pre-login subflow, electron session injection. UI login on every test is slow and flaky, so log-in-once-then-reuse is the default.

Test data picks among local-reset, test-db (dedicated staging, unique IDs), and none. Either way, the production DB is banned (S6). For mobile, the device picks among ios-sim, android-emu, and cloud-farm.

The stack-specific traps get flagged here too. For Supabase, the session lives in localStorage, so the recommendation is the REST-login-then-globalSetup-injection pattern; for Stripe payments, Hosted Checkout is cross-origin, so the recommendation is verifying through the webhook and Test Clocks.

CI strategy

It settles whether to generate a GitHub Actions workflow. web does browser install every time plus sharding and a blob merge. electron uses an OS matrix with xvfb on the Linux job only. mobile uses android-emulator-runner, simctl, or Maestro Cloud. If you only run locally, nothing gets generated. The implementation detail lives in ci.

e2e-suite-config.json

The decisions persist immediately. Each answer is recorded straight into e2e-suite-config.json, and the implement-* skills and /2-setup-base read these field names verbatim.

{
  "targets": ["web"],
  "tools": {
    "web": "playwright",
    "electron": "playwright-electron",
    "mobile": "maestro"
  },
  "auth_strategy": "storageState",
  "database_strategy": "local-reset",
  "devices": { "mobile": ["android-emu"] },
  "ci": {
    "provider": "github-actions",
    "web_sharding": true,
    "electron_xvfb": true,
    "mobile_runner": "emulator"
  },
  "decided_at": "<ISO>"
}

The tools keys for tracks you did not pick are omitted or ignored. Call the same phase again and, when the decision file already exists, it reconfirms “keep / change” first. This is a spot where rerunning is safe.

ADRs only for decisions that are expensive to undo

Accept the recommended default as is and only the spec gets updated. An ADR attaches to an off-recommendation choice — picking Cypress for web, or Detox or Appium instead of Maestro for mobile. It leaves one page at docs/adr/<NNNN>-tool-<track>-<choice>.md on why that tool. Stack ADRs for trivial decisions too and the genuinely expensive ones get buried.

verification comes next. The rubric that reads an implemented suite across six dimensions, the forbidden patterns, the S1 through S8 security baseline, and where you go after getting blocked twice.