ci: Stand up multi-package mutation health — nightly passes + mutant-* skills (no-changelog) (#31356)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 09:47:00 +02:00 · 2026-06-01 09:57:25 +01:00 · 2026-06-01 09:57:25 +01:00 · ecfc39b3e5
commit ecfc39b3e5
parent 1b8235ef76
14 changed files with 413 additions and 239 deletions
--- a/.claude/plugins/n8n/skills/mutant-diff/SKILL.md
+++ b/.claude/plugins/n8n/skills/mutant-diff/SKILL.md
@ -0,0 +1,126 @@
+---
+description: Run Stryker mutation testing on the source files changed in the current branch (vs origin/master) across vitest packages. One command for "did my work hold up under mutation?" before pushing. Triages which files dropped below threshold and offers to invoke n8n:mutant-fix on them. Use when the user says /mutant-diff, "mutate what I changed", "check my changes", or has just finished writing a feature and wants pre-merge feedback.
+---
+
+# Mutate what I changed
+
+Closes the local dev loop. Single command to run Stryker against every changed source file the current branch touched (vs `origin/master`) in a mutation-eligible package, then point at any reds that need strengthening.
+
+## When to use
+
+- User says `/mutant-diff`, "mutate the files I changed", "check my changes", "did my tests stick"
+- Mid-feature: dev wants pre-merge feedback before pushing
+- Pre-PR: cheaper than waiting for the nightly cron
+
+**Don't** use:
+- For a single specific file (`/n8n:mutant-score <path>` is faster)
+- After the user already ran `/n8n:mutant-fix` (which calls mutant-score internally for verification — running both again is wasted compute)
+
+Changed files in **jest** packages (`nodes-base`, `cli`, `db`, …) and in `@n8n/expression-runtime` are skipped automatically — see step 1.
+
+## Inputs
+
+- **Default base**: `origin/master`. Override with `--base <ref>` if comparing against another branch (e.g. `--base HEAD~5`).
+- **Default scope**: `packages/**/src/**/*.ts`, narrowed to mutation-eligible (vitest) packages in step 1.
+
+## Steps
+
+### 1. Identify changed, eligible source files
+
+```bash
+git diff --name-only origin/master...HEAD -- 'packages/**/src/**/*.ts'
+```
+
+(`...` is correct — three-dot means "since the branch diverged from base," which is what we want.)
+
+If `git fetch` hasn't been run recently, suggest the user `git fetch origin master` first; otherwise the base ref is stale.
+
+Filter out:
+- `**/*.d.ts` (declarations, no behaviour)
+- `**/*.stories.ts` (Storybook scaffolding)
+- `index.ts` files (barrels)
+- `interfaces.ts`, `types.ts`, `constants.ts` (low-value, same filter the picker uses)
+- **files in non-eligible packages**: for each changed file, find its package (nearest ancestor dir with a `package.json`). Keep the file only if that package has a `vitest.config.*` **and** is not `@n8n/expression-runtime` (it's the isolated-vm engine — blocked on DEVP-257). Drop jest packages with a one-line note: `skipped (jest): packages/cli/src/...`.
+
+### 2. Surface the plan to the user
+
+Print the filtered list before running anything. Each Stryker run is 1–5 minutes; the user should confirm if there are many.
+
+```
+Found N changed source files to mutate (J skipped: jest / expression-runtime):
+  - packages/workflow/src/foo.ts
+  - packages/@n8n/crdt/src/bar.ts
+  ...
+
+Estimated runtime: ~M-K minutes (M minutes minimum if every Stryker run is fast).
+Proceed? (skill default: yes if N ≤ 3, ask if N > 3)
+```
+
+If the filtered list is empty: report "No mutation-eligible source files changed vs $base — nothing to mutate." and stop. Exit cleanly.
+
+If N > 8: refuse and ask the user to narrow scope (a different base ref, or invoke per-file). Running 8+ mutations sequentially is a 30+ minute session that should be a deliberate choice.
+
+### 3. Run mutation testing per file
+
+For each file in the plan, invoke `pnpm mutate <repo-relative-path>` (the package is inferred from the path). Capture the score per file from the `✓ / ✗` summary line mutate.mjs prints to stderr — don't re-read `summary.json` (it's overwritten per run, and lives in that file's own package).
+
+After each run completes, print one line:
+
+```
+✓ packages/workflow/src/foo.ts        95.12% (39/41 killed)   GREEN
+✗ packages/@n8n/crdt/src/bar.ts       54.83% (17/31 killed)   RED  — 13 survivors, top: ConditionalExpression, EqualityOperator
+```
+
+If a Stryker run hard-fails (exit 3, no `summary.json`), print `! <file>  Stryker failed — see stderr` and continue to the next file. Don't abort the whole batch.
+
+### 4. Summary table
+
+After all files have been mutated, print one compact table:
+
+```
+=== Mutation results: N files, M green, K red, J failed ===
+| File                            | Score   | Verdict | Survivors |
+|---------------------------------|---------|---------|-----------|
+| packages/workflow/src/foo.ts    |  95.12% | GREEN   |         2 |
+| packages/@n8n/crdt/src/bar.ts   |  54.83% | RED     |        13 |
+| packages/workflow/src/baz.ts    |    n/a  | FAILED  |         - |
+```
+
+### 5. Offer the strengthen step on the worst red file
+
+If any file came back red:
+
+> The lowest-score red file is `packages/@n8n/crdt/src/bar.ts` (54.83%, 13 survivors). Run `/n8n:mutant-fix <file>` to triage them and write assertion changes? (suggesting; don't auto-invoke)
+
+Only suggest one file at a time — `n8n:mutant-fix` caps at 5 survivors per invocation, and re-running this skill after edits is cheap.
+
+If everything is green: report it and stop. No follow-up needed.
+
+## Output shape
+
+Three deliverable sections per invocation:
+
+1. **Plan** (before running) — list of eligible files (+ skipped count), estimated runtime
+2. **Per-file progress** (during) — one line per file as it completes
+3. **Summary table + recommendation** (after) — compact view
+
+Don't dump full `summary.json` payloads — each mutate run writes one to `<package>/reports/mutation/` (overwritten per run). The user can read the latest one if they want detail.
+
+## Constraints
+
+- **Vitest packages only.** Jest packages and `@n8n/expression-runtime` are filtered out in step 1, not mutated.
+- **Max 8 files per invocation.** Above that, ask user to narrow.
+- **Don't auto-invoke `/n8n:mutant-fix`.** Suggest, don't act. Same reasoning as the other skills: each pass should be a deliberate human-approved step.
+- **No commits.** Edits land in working tree; user reviews.
+- **No fabricated scores.** If a Stryker run fails, mark FAILED in the table — never guess a value.
+
+## Common follow-ups
+
+- "strengthen them all" → loop the user through `/n8n:mutant-fix`, one file at a time
+- "what changed?" → `git diff origin/master...HEAD -- <file>` for the file in question
+- "ignore <pattern>" → re-run with the user's exclude added to the filter
+
+## Related
+
+- `n8n:mutant-score` — single-file version of this skill
+- `n8n:mutant-fix` — the natural next step when reds show up
--- a/.claude/plugins/n8n/skills/strengthen-tests/SKILL.md
+++ b/.claude/plugins/n8n/skills/strengthen-tests/SKILL.md
@ -1,41 +1,45 @@
 ---
-description: Take a Stryker summary.json (from n8n:mutation-test), triage the surviving mutants by user-reachable-behaviour risk, write minimal assertion changes to kill the top 3-5 highest-leverage survivors, then verify by re-running n8n:mutation-test. Use when the user has just run mutation testing and wants to strengthen the test suite, or says "kill the survivors / strengthen tests / fix the red." Pairs with n8n:mutation-test as the inner write side of a single iteration.
+description: Take a Stryker summary.json (from n8n:mutant-score), triage the surviving mutants by user-reachable-behaviour risk, write minimal assertion changes to kill the top 3-5 highest-leverage survivors, then verify by re-running n8n:mutant-score. Use when the user has just run mutation testing and wants to strengthen the test suite, or says "kill the survivors / strengthen tests / fix the red." Pairs with n8n:mutant-score as the inner write side of a single iteration.
 ---

 # Strengthen tests — kill the highest-leverage survivors

-The other half of the local mutation-testing loop. `n8n:mutation-test` reports which mutations escaped the tests; this skill picks the ones that matter and writes minimal assertion changes to kill them.
+The other half of the local mutation-testing loop. `n8n:mutant-score` reports which mutations escaped the tests; this skill picks the ones that matter and writes minimal assertion changes to kill them.

 ## When to use

- User has just run `/n8n:mutation-test <file>` and the verdict was `red`
+- User has just run `/n8n:mutant-score <file>` and the verdict was `red`
 - User says: "strengthen tests", "kill the survivors", "fix the red", "iterate on the tests for X"
- Mid-loop: this skill's verify step calls `n8n:mutation-test` again, so the loop closes here
+- Mid-loop: this skill's verify step calls `n8n:mutant-score` again, so the loop closes here

 **Don't** use this skill:
- Before any mutation testing has been run for the target file (no `summary.json` to triage)
 - For a `green` verdict — there's nothing to strengthen; if user insists, push back and ask which file actually needs work
 - To bulk-kill every survivor — explicitly capped at 5 per invocation. Re-invoke for more.

 ## Inputs

- **Default**: read `packages/workflow/reports/mutation/summary.json` (the last `n8n:mutation-test` run's output).
- **Override**: `--summary <path>` to point at a different summary file.
+Accepts **either** a source file or an existing summary — whichever you have:
+
+- **A source file** (a repo-relative path, e.g. `packages/workflow/src/workflow-checksum.ts` — package inferred): self-bootstrapping — there's no summary yet, so step 1 runs `n8n:mutant-score` to produce one, then proceeds. This is the entry point for unattended callers (e.g. cat-bot acting on a ledger gap).
+- **An existing summary**: `--summary <path>`. A prior `n8n:mutant-score` run wrote it to that package's `reports/mutation/summary.json` (e.g. `packages/workflow/reports/mutation/summary.json`). Skips the bootstrap.

 ## Steps

-### 1. Read the summary
+### 1. Get a summary (bootstrap if needed)

-`packages/workflow/reports/mutation/summary.json`. Already compact (~50 KB). Pull:
+- **Given a source file** (or no usable summary on disk): run `n8n:mutant-score` on the file first to generate `<package-dir>/reports/mutation/summary.json`. Then read it.
+- **Given/defaulting to a summary path**: read it directly.
+
+Read the summary (already compact, ~50 KB) and pull:
 - `files[0].file` — the source file under test
 - `files[0].score` — current mutation score
 - `files[0].survivors[]` — every surviving (and no-coverage) mutant with location, replacement, covering test names

-If `summary.json` is missing, stop. Tell the user to run `n8n:mutation-test` first.
+If the verdict is already `green`, stop — nothing to strengthen.

 ### 2. Read the source under test, sparingly

-Read the source file referenced in `summary.json`. Read **once**, the whole file (typical n8n-workflow source files are 50-500 lines; the cost is bounded). This is the only file read; don't load test files yet.
+Read the source file referenced in `summary.json`. Read **once**, the whole file (typical source files are 50-500 lines; the cost is bounded). This is the only file read; don't load test files yet.

 ### 3. Triage the survivors

@ -102,13 +106,13 @@ Use `Edit` with exact-string matches. Never rewrite entire test files.

 ### 7. Verify

-Re-invoke `n8n:mutation-test` on the same source file. Report:
+Re-invoke `n8n:mutant-score` on the same source file. Report:

 ```
 Before:  red 76.74% (28 survivors)
 After:   green 82.34% (22 survivors)
 Killed:  6 of 5 targeted (1 bonus — fix for #77 also killed #78)
-Still surviving: 22 — re-invoke /n8n:strengthen-tests for another batch.
+Still surviving: 22 — re-invoke /n8n:mutant-fix for another batch.
 ```

 If the score went UP but threshold still not met: the iteration is working, recommend another pass.
@ -131,7 +135,7 @@ Keep prose minimal between sections. The plan and verify steps are the structure
 - **Never fabricate assertions.** If the source doesn't clearly do X, don't claim it does.
 - **No new test files unless absolutely necessary.** Extend the existing covering test file.
 - **No reverting other people's tests.** Only edit tests in the package being mutated.
- **No re-running mutation-test more than once per invocation.** That's the verify step. Don't loop within a single invocation; let the user re-invoke.
+- **No re-running mutant-score more than once per invocation.** That's the verify step. Don't loop within a single invocation; let the user re-invoke.
 - **No commits.** Edits land in the working tree; user reviews and commits.

 ## Common follow-ups
@ -143,5 +147,5 @@ Keep prose minimal between sections. The plan and verify steps are the structure

 ## Related

- `n8n:mutation-test` — the read side of this loop
+- `n8n:mutant-score` — the read side of this loop
 - `scripts/mutation-health/README.md` — the BQ-backed observability story this slots into
--- a/.claude/plugins/n8n/skills/mutation-test/SKILL.md
+++ b/.claude/plugins/n8n/skills/mutation-test/SKILL.md
@ -1,45 +1,45 @@
 ---
-description: Run Stryker mutation testing on a single source file and return a structured, token-frugal report that's pipeable to a follow-up "strengthen tests" loop. Use when the user says /mutation-test, "mutation test this file", or has just edited tests and wants to verify they actually assert behaviour. Per-file only — full-package mutation runs are out of scope.
+description: Run Stryker mutation testing on a single source file and return a structured, token-frugal report that's pipeable to a follow-up "strengthen tests" loop. Use when the user says /mutant-score, "mutation test this file", or has just edited tests and wants to verify they actually assert behaviour. Per-file only — full-package mutation runs are out of scope.
 ---

 # Mutation testing — single file

-Wraps `pnpm --filter=<pkg> mutate <file>` and parses `summary.json` into a compact, structured shape suitable for downstream "strengthen the surviving mutants" iteration.
+Wraps `pnpm mutate <file>` and parses `summary.json` into a compact, structured shape suitable for downstream "strengthen the surviving mutants" iteration. Works for any vitest package — `pnpm mutate` infers the package from the path.

 ## When to use

- User explicitly invokes: `/mutation-test <path>`, "mutation test this file", "check my test effectiveness on X"
+- User explicitly invokes: `/mutant-score <path>`, "mutation test this file", "check my test effectiveness on X"
 - User has just edited a test file and wants to know if their assertions are load-bearing
 - Follow-up loop after a `red` verdict — feed the structured output back to a "fix" iteration

 **Don't** use this skill for:
 - Whole-package or whole-repo mutation runs — single file only
 - Coverage % questions (use the existing coverage workflow)
- Files outside `packages/workflow/` — Stryker is only wired up there today
+- **jest** packages (`nodes-base`, `cli`, `db`) — Stryker's vitest-runner only covers vitest packages
+- `@n8n/expression-runtime` — it's the isolated-vm engine (blocked on DEVP-257)

 ## Inputs

-One required argument: the source file to mutate, as either a repo-relative path or a package-relative path. Examples that all mean the same thing:
+One required argument: the source file to mutate. Prefer a **repo-relative path** — the package is inferred from it:

- `packages/workflow/src/cron.ts`
- `src/cron.ts` (assumes packages/workflow)
- `workflow/src/cron.ts` (assumes packages/)
+- `packages/workflow/src/cron.ts` (package inferred)
+- `packages/@n8n/crdt/src/utils.ts` (package inferred)

-If ambiguous, ask the user once which package; do not guess.
+A bare package-relative path (`src/cron.ts`) is ambiguous — pass the repo-relative path, or add `--package-dir <pkg>`. Don't guess the package.

 ## Steps

-1. **Resolve package + relative source path.** Today only `n8n-workflow` (`packages/workflow`) has Stryker wired. If the user passes a file outside that, say so and stop — don't fabricate output.
+1. **Resolve the target.** Any vitest package works; `pnpm mutate` infers the package from a repo-relative path. If the file is in a jest package or `@n8n/expression-runtime`, say so and stop — don't fabricate output.

 2. **Run Stryker with trimmed output:**
   ```bash
-   pnpm --filter=n8n-workflow mutate <package-relative-src> 2>&1 | tail -40
+   pnpm mutate <repo-relative-file> 2>&1 | tail -40
   ```
   `tail -40` discards the Stryker progress bar spam; the relevant numbers + survivor list always land in the last ~30 lines. Exit codes: `0` = pass, `1` = below threshold (still valid, summary.json exists), `2` = usage error, `3` = Stryker failure (no summary.json).

 3. **If exit code 3**, surface the trimmed tail to the user, suggest checking that workspace deps are built (`pnpm build`), and stop. Don't fabricate a report.

-4. **Read `packages/workflow/reports/mutation/summary.json`** — never `raw.json`. raw.json is 600KB+ and not needed for the strengthen loop. summary.json already contains every surviving mutant with its location, replacement, mutator name, and the names of tests that covered the line.
+4. **Read the package's `reports/mutation/summary.json`** (e.g. `packages/workflow/reports/mutation/summary.json`) — never `raw.json`. raw.json is 600KB+ and not needed for the strengthen loop. summary.json already contains every surviving mutant with its location, replacement, mutator name, and the names of tests that covered the line.

 5. **Cap covering_tests at 3 per survivor.** If a mutant was covered by more than 3 tests, keep the first 3 and append `+N more` as a count. Names beyond 3 add tokens without adding actionable signal — the strengthen loop only needs to know *which test* to extend, not all of them.

@ -100,16 +100,17 @@ Order the survivors array by `location` (ascending line number, then column) so
 - **No raw.json** — never read or surface it. summary.json is the only input.
 - **No HTML report** — don't `open` raw.html or paste links to it. If the user wants visual exploration they'll ask.
 - **No automatic triage** — don't categorise survivors by "real bug" vs "refactor insurance." That's a separate analysis step that should happen on demand, not by default. Keeps token cost predictable.
- **No "I'll regenerate tests for you now"** — this skill reports the gap. Use `n8n:strengthen-tests` if you want assertion edits.
+- **No "I'll regenerate tests for you now"** — this skill reports the gap. Use `n8n:mutant-fix` if you want assertion edits.

 ## Common follow-ups (don't do unless asked)

 - User says "fix these" → start a strengthen loop using the JSON output as input. Read covering_tests source, propose changes per mutant, run the skill again to verify.
 - User says "explain survivor #N" → fetch that mutant from summary.json, show its surrounding ~5 lines from the source file, no analysis beyond what summary.json contains.
 - User says "what's the threshold?" → 80% provisional; see `scripts/mutation-health/README.md` for the rationale.
- User says "run it on the changed files" → not wired yet. Suggest `git diff` to find candidates, then invoke this skill per file.
+- User says "run it on the changed files" → use `n8n:mutant-diff` (mutates the diff vs origin/master).

 ## Related

 - `scripts/mutation-health/README.md` — the broader BQ-backed observability story
- `packages/workflow/stryker.config.mjs` — the Stryker config this skill drives
+- `scripts/mutation-health/stryker.default.mjs` — the default Stryker config; a package may override with its own `stryker.config.mjs` (e.g. `packages/workflow` carves out the isolated-vm engine)
+- `n8n:mutant-fix` — the strengthen-the-survivors counterpart
--- a/.claude/plugins/n8n/skills/mutate-changed/SKILL.md
+++ b/.claude/plugins/n8n/skills/mutate-changed/SKILL.md
@ -1,124 +0,0 @@
---
-description: Run Stryker mutation testing on the source files changed in the current branch (vs origin/master). One command for "did my work hold up under mutation?" before pushing. Triages on the side which files dropped below threshold and offers to invoke n8n:strengthen-tests on them. Use when the user says /mutate-changed, "mutate what I changed", "check my changes", or has just finished writing a feature and wants pre-merge feedback. Scope: only packages/workflow/src/** changes are mutated today.
---
-
-# Mutate what I changed
-
-Closes the local dev loop. Single command to run Stryker against every source file the current branch touched (vs `origin/master`), then point at any reds that need strengthening.
-
-## When to use
-
- User says `/mutate-changed`, "mutate the files I changed", "check my changes", "did my tests stick"
- Mid-feature: dev wants pre-merge feedback before pushing
- Pre-PR: cheaper than waiting for the nightly cron
-
-**Don't** use:
- For a single specific file (`/n8n:mutation-test <path>` is faster)
- For non-`packages/workflow` changes — Stryker is only wired up there today
- After the user already ran `/n8n:strengthen-tests` (which calls mutation-test internally for verification — running both again is wasted compute)
-
-## Inputs
-
- **Default base**: `origin/master`. Override with `--base <ref>` if comparing against another branch (e.g. `--base HEAD~5`).
- **Default scope**: `packages/workflow/src/**/*.ts`. The only package with Stryker wired up today.
-
-## Steps
-
-### 1. Identify changed source files
-
-```bash
-git diff --name-only origin/master...HEAD -- 'packages/workflow/src/**/*.ts'
-```
-
-(`...` is correct — three-dot means "since the branch diverged from base," which is what we want.)
-
-If `git fetch` hasn't been run recently, suggest the user `git fetch origin master` first; otherwise the base ref is stale.
-
-Filter out:
- `**/*.d.ts` (declarations, no behaviour)
- `**/*.stories.ts` (Storybook scaffolding, not present in workflow but defensive)
- `index.ts` files (barrels)
- `interfaces.ts`, `types.ts`, `constants.ts` (same low-value filter as `seed-ledger.mjs`)
-
-### 2. Surface the plan to the user
-
-Print the filtered list before running anything. Each Stryker run is 1–5 minutes; the user should confirm if there are many.
-
-```
-Found N changed source files to mutate:
-  - packages/workflow/src/foo.ts
-  - packages/workflow/src/bar.ts
-  ...
-
-Estimated runtime: ~M-K minutes (M minutes minimum if every Stryker run is fast).
-Proceed? (skill default: yes if N ≤ 3, ask if N > 3)
-```
-
-If the filtered list is empty: report "No source files under packages/workflow/src/** changed vs $base — nothing to mutate." and stop. Exit cleanly.
-
-If N > 8: refuse and ask the user to narrow scope (a different base ref, or invoke per-file). Running 8+ mutations sequentially is a 30+ minute session that should be a deliberate choice.
-
-### 3. Run mutation testing per file
-
-For each file in the plan, invoke `pnpm --filter=n8n-workflow mutate <package-relative-path>`. The `summary.json` and other artefacts get overwritten on each run, so capture the score per file as you go.
-
-After each run completes, print one line:
-
-```
-✓ src/foo.ts   95.12% (39/41 killed)   GREEN
-✗ src/bar.ts   54.83% (17/31 killed)   RED  — 13 survivors, top: ConditionalExpression, EqualityOperator
-```
-
-If a Stryker run hard-fails (exit 3, no `summary.json`), print `! src/foo.ts  Stryker failed — see stderr` and continue to the next file. Don't abort the whole batch.
-
-### 4. Summary table
-
-After all files have been mutated, print one compact table:
-
-```
-=== Mutation results: N files, M green, K red, J failed ===
-| File                        | Score   | Verdict | Survivors |
-|-----------------------------|---------|---------|-----------|
-| src/foo.ts                  |  95.12% | GREEN   |         2 |
-| src/bar.ts                  |  54.83% | RED     |        13 |
-| src/baz.ts                  |    n/a  | FAILED  |         - |
-```
-
-### 5. Offer the strengthen step on the worst red file
-
-If any file came back red:
-
-> The lowest-score red file is `src/bar.ts` (54.83%, 13 survivors). Run `/n8n:strengthen-tests` to triage them and write assertion changes? (suggesting; don't auto-invoke)
-
-Only suggest one file at a time — `n8n:strengthen-tests` caps at 5 survivors per invocation, and re-running this skill after edits is cheap.
-
-If everything is green: report it and stop. No follow-up needed.
-
-## Output shape
-
-Three deliverable sections per invocation:
-
-1. **Plan** (before running) — list of files, estimated runtime
-2. **Per-file progress** (during) — one line per file as it completes
-3. **Summary table + recommendation** (after) — compact view
-
-Don't dump full `summary.json` payloads — the per-file mutate runs already write them to disk under `packages/workflow/reports/mutation/` (overwriting each time, since the orchestrator uses fixed filenames). The user can read the latest one if they want detail.
-
-## Constraints
-
- **Hardcoded to `packages/workflow`.** Generalise when Stryker is wired up to other packages.
- **Max 8 files per invocation.** Above that, ask user to narrow.
- **Don't auto-invoke `/n8n:strengthen-tests`.** Suggest, don't act. Same reasoning as the other skills: each pass should be a deliberate human-approved step.
- **No commits.** Edits land in working tree; user reviews.
- **No fabricated scores.** If a Stryker run fails, mark FAILED in the table — never guess a value.
-
-## Common follow-ups
-
- "strengthen them all" → loop the user through `/n8n:strengthen-tests`, one file at a time
- "what changed?" → `git diff origin/master...HEAD -- <file>` for the file in question
- "ignore <pattern>" → re-run with the user's exclude added to the filter
-
-## Related
-
- `n8n:mutation-test` — single-file version of this skill
- `n8n:strengthen-tests` — the natural next step when reds show up
--- a/.github/workflows/mutation-health-nightly.yml
+++ b/.github/workflows/mutation-health-nightly.yml
@ -2,38 +2,76 @@ name: 'Mutation Health (nightly)'

 on:
  schedule:
-    # 03:30 UTC daily — outside CI rush, before EU morning.
+    # 03:30 UTC daily — outside CI rush, before EU morning. Runs both passes.
    - cron: '30 3 * * *'
  workflow_dispatch:
    inputs:
-      package:
-        description: 'Workspace package to mutate'
+      mode:
+        description: 'Which pass to run'
        type: choice
        options:
-          - n8n-workflow
-        default: n8n-workflow
+          - both
+          - baseline # score files with no result yet (the `new` bucket)
+          - coverage # revisit the weakest scored files (`red`/`stale`, lowest first)
+        default: both

 permissions:
  contents: read

-# Prevent overlapping scheduled + manual runs from racing on the same
-# ledger row. The writer MERGE is idempotent, but two concurrent runs
-# would emit duplicate event rows for the same picked file.
-concurrency:
-  group: mutation-health-${{ github.event.inputs.package || 'n8n-workflow' }}
-  cancel-in-progress: false
-
 env:
-  PACKAGE_NAME: ${{ github.event.inputs.package || 'n8n-workflow' }}
-  PKG_DIR: packages/workflow # 1-to-1 mapping today; generalise when more packages are wired up
-  REPORTS_DIR: packages/workflow/reports/mutation
  READER_URL: https://internal.users.n8n.cloud/webhook/mutation-health-ledger

 jobs:
-  run:
-    name: Mutate one file, ledger writeback
+  # Build the (package × pass) matrix once. The package list lives here only —
+  # onboarding a vitest package is a one-line addition below. The requested
+  # `mode` filters which passes run (scheduled runs default to both).
+  #   baseline → score files with no result yet (picker `new` bucket)
+  #   coverage → revisit the weakest scored files (picker `red`/`stale`, lowest first)
+  setup:
+    runs-on: ubuntu-latest
+    outputs:
+      matrix: ${{ steps.build.outputs.matrix }}
+    steps:
+      - name: Build matrix
+        id: build
+        env:
+          REQUESTED_MODE: ${{ github.event.inputs.mode || 'both' }}
+        # vitest packages only. A package with its own stryker.config.mjs overrides
+        # the shared default (scripts/mutation-health/stryker.default.mjs) — n8n-workflow
+        # does this to run the legacy expression engine and dodge the isolated-vm
+        # dry-run crash (see its config + DEVP-257). Jest packages and
+        # @n8n/expression-runtime are intentionally absent.
+        run: |
+          node -e '
+            const packages = [
+              { name: "n8n-workflow", dir: "packages/workflow", slug: "workflow" },
+              { name: "@n8n/crdt", dir: "packages/@n8n/crdt", slug: "crdt" },
+              { name: "@n8n/decorators", dir: "packages/@n8n/decorators", slug: "decorators" },
+            ];
+            const req = process.env.REQUESTED_MODE;
+            const modes = req === "both" ? ["baseline", "coverage"] : [req];
+            const include = packages.flatMap((p) => modes.map((mode) => ({ ...p, mode })));
+            console.log("matrix=" + JSON.stringify({ include }));
+          ' >> "$GITHUB_OUTPUT"
+
+  mutate:
+    needs: setup
+    name: ${{ matrix.mode }} · ${{ matrix.name }}
    runs-on: blacksmith-4vcpu-ubuntu-2204
    timeout-minutes: 60
+    strategy:
+      fail-fast: false # one leg's failure must not skip the others
+      matrix: ${{ fromJSON(needs.setup.outputs.matrix) }}
+    # Per package + pass: a scheduled run and a manual run won't double-write the
+    # same ledger row, but different packages/passes still run in parallel.
+    concurrency:
+      group: mutation-health-${{ matrix.mode }}-${{ matrix.slug }}
+      cancel-in-progress: false
+    env:
+      PACKAGE_NAME: ${{ matrix.name }}
+      PKG_DIR: ${{ matrix.dir }}
+      MODE: ${{ matrix.mode }}
+      REPORTS_DIR: ${{ matrix.dir }}/reports/mutation
    steps:
      - name: Checkout
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@ -46,18 +84,20 @@ jobs:
      - name: Fetch live ledger from BigQuery
        run: |
          mkdir -p "$REPORTS_DIR"
-          curl --fail -sS "$READER_URL?package=$PACKAGE_NAME" -o "$REPORTS_DIR/live-ledger.json"
+          curl --fail -sS --get --data-urlencode "package=$PACKAGE_NAME" \
+            "$READER_URL" -o "$REPORTS_DIR/live-ledger.json"

      - name: Pick next source file
        id: pick
        run: |
          picked_json=$(node scripts/mutation-health/pick-next.mjs \
            --package-dir "$PKG_DIR" \
-            --ledger-file "$REPORTS_DIR/live-ledger.json")
+            --ledger-file "$REPORTS_DIR/live-ledger.json" \
+            --mode "$MODE")
          echo "$picked_json"
          src_repo=$(echo "$picked_json" | jq -r '.picked.source_file_path // ""')
          if [ -z "$src_repo" ]; then
-            echo "::notice::Picker returned no work (all-green / empty ledger). Exiting cleanly."
+            echo "::notice::Picker returned no work for mode=$MODE (bucket empty). Exiting cleanly."
            echo "skip=true" >> "$GITHUB_OUTPUT"
            exit 0
          fi
@ -76,7 +116,7 @@ jobs:
        # "Stryker crashed." continue-on-error: true would collapse them.
        run: |
          set +e
-          pnpm --filter "$PACKAGE_NAME" mutate "${{ steps.pick.outputs.source-rel }}"
+          node scripts/mutation-health/mutate.mjs "${{ steps.pick.outputs.source-rel }}" --package-dir "$PKG_DIR"
          rc=$?
          set -e
          if [ "$rc" -gt 1 ]; then
@ -107,7 +147,7 @@ jobs:
        if: always()
        uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
        with:
-          name: mutation-health-${{ env.PACKAGE_NAME }}-${{ github.run_id }}
+          name: mutation-health-${{ matrix.mode }}-${{ matrix.slug }}-${{ github.run_id }}
          path: |
            ${{ env.REPORTS_DIR }}/raw.json
            ${{ env.REPORTS_DIR }}/raw.html
--- a/package.json
+++ b/package.json
@ -40,6 +40,7 @@
    "optimize-svg": "find ./packages -name '*.svg' ! -name 'pipedrive.svg' -print0 | xargs -0 -P16 -L20 npx svgo",
    "setup-backend-module": "node scripts/ensure-zx.mjs && zx scripts/backend-module/setup.mjs",
    "start": "node scripts/os-normalize.mjs --dir packages/cli/bin n8n",
+    "mutate": "node scripts/mutation-health/mutate.mjs",
    "test": "JEST_JUNIT_CLASSNAME={filepath} turbo run test",
    "test:ci": "turbo run test --continue --concurrency=1",
    "test:ci:frontend": "turbo run test --continue --filter='./packages/frontend/**'",
--- a/packages/workflow/package.json
+++ b/packages/workflow/package.json
@ -33,7 +33,7 @@
    "test:unit": "vitest run",
    "test:changed": "janitor test-scoped --runner=vitest",
    "test:dev": "vitest --watch",
-    "mutate": "node scripts/mutate.mjs"
+    "mutate": "node ../../scripts/mutation-health/mutate.mjs --package-dir packages/workflow"
  },
  "files": [
    "dist/**/*"
--- a/packages/workflow/test/setup-vm-evaluator.ts
+++ b/packages/workflow/test/setup-vm-evaluator.ts
@ -14,13 +14,7 @@ if (process.env.N8N_EXPRESSION_ENGINE === 'vm') {
 		});
 	});

-	// Under Stryker, the worker process exits the moment vitest finishes — the
-	// OS reclaims isolated-vm native handles either way. Calling dispose here
-	// aborts the worker on Node 24 with a native finaliser assertion, which
-	// Stryker reports as a dry-run failure.
-	if (!process.env.STRYKER_RUN) {
-		afterAll(async () => {
-			await Expression.disposeExpressionEngine();
-		});
-	}
+	afterAll(async () => {
+		await Expression.disposeExpressionEngine();
+	});
 }
--- a/packages/workflow/vitest.stryker.config.ts
+++ b/packages/workflow/vitest.stryker.config.ts
@ -1,24 +1,15 @@
-// Vitest config used by Stryker only — NOT by `pnpm test`, NOT by CI's
-// unit-test runs. Runs the single forward-looking `vm-engine` project
-// (N8N_EXPRESSION_ENGINE=vm) rather than both engines.
+// Vitest config used by Stryker only — NOT by `pnpm test`, NOT by CI unit runs.
 //
-// Two reasons:
-//   1. Halves Stryker's dry-run cost — only one vitest project loads per
-//      Stryker worker, removing concurrent isolated-vm initialisation
-//      pressure that occasionally crashes the dry-run on local machines.
-//   2. Mutation score reflects test effectiveness against the engine n8n
-//      is moving to. Legacy-engine is being phased out; tests that pass
-//      only under it shouldn't pad the mutation score.
-//
-// The default `vitest.config.ts` still runs both projects for `pnpm test`
-// and CI — engine-equivalence is asserted there.
+// Runs the legacy expression engine, not vm/isolated-vm: the vm evaluator
+// SIGABRTs in Stryker's worker on teardown (upstream isolated-vm #464, repros on
+// Node 22 + 24). Trade-off: vm-only branches in expression.ts go unscored. Swap
+// back to N8N_EXPRESSION_ENGINE=vm once #464 is fixed or we pnpm-patch the guard.

 import { defineConfig } from 'vitest/config';
 import { createBaseInlineConfig } from '@n8n/vitest-config/node';

 const { reporters, outputFile, ...sharedTestConfig } = createBaseInlineConfig({
 	include: ['test/**/*.test.ts'],
-	setupFiles: ['./test/setup-vm-evaluator.ts'],
 });

 export default defineConfig({
@ -29,10 +20,7 @@ export default defineConfig({
 			{
 				test: {
 					...sharedTestConfig,
-					name: 'vm-engine',
-					// STRYKER_RUN tells setup-vm-evaluator.ts to skip the
-					// isolated-vm disposer on teardown — see that file for why.
-					env: { N8N_EXPRESSION_ENGINE: 'vm', STRYKER_RUN: 'true' },
+					name: 'legacy-engine',
 				},
 			},
 		],
--- a/scripts/mutation-health/DEMO-HANDOVER.md
+++ b/scripts/mutation-health/DEMO-HANDOVER.md
@ -1,12 +1,12 @@
 # Demo handover: stacked PR on #30956

-Use this prompt to drive the strengthen-tests loop end-to-end and open a stacked PR that demonstrates the trial.
+Use this prompt to drive the mutant-fix loop end-to-end and open a stacked PR that demonstrates the trial.

 ---

 ## Prompt

-> I want to demo the mutation-health strengthen-tests loop from PR #30956. Drive the whole flow from a fresh branch and open a stacked PR.
+> I want to demo the mutation-health mutant-fix loop from PR #30956. Drive the whole flow from a fresh branch and open a stacked PR.
 >
 > **Base branch**: `devp-stryker-mvp-spike` (the PR's branch — not master yet).
 >
@ -22,23 +22,23 @@ Use this prompt to drive the strengthen-tests loop end-to-end and open a stacked
 >    Use whatever it returns. As of 2026-05-22, that's `src/workflow-checksum.ts` at 38.64% — but check live state first.
 >
 > 3. Run the local mutation-testing skill on that file:
->    `/n8n:mutation-test packages/workflow/src/<picked-file>`
+>    `/n8n:mutant-score packages/workflow/src/<picked-file>`
 >
 >    Confirm the output JSON shows the score and a list of survivors with mutator + location + covering tests.
 >
 > 4. Run the strengthen skill:
->    `/n8n:strengthen-tests`
+>    `/n8n:mutant-fix`
 >
->    It'll triage survivors (HIGH/MODERATE/LOW), edit the covering test file with targeted assertions, and re-run `n8n:mutation-test` to verify the score climbed. Max 5 survivors per pass.
+>    It'll triage survivors (HIGH/MODERATE/LOW), edit the covering test file with targeted assertions, and re-run `n8n:mutant-score` to verify the score climbed. Max 5 survivors per pass.
 >
 > 5. Review the diff yourself: `git diff packages/workflow/test/`
 >
 >    Sanity-check each new assertion. Reject anything that's mocking-the-mock, asserting trivia, or pinning behaviour the source doesn't actually have. The skill is supposed to refuse to fabricate but humans verify.
 >
-> 6. If you want to push further, re-invoke `/n8n:strengthen-tests` for the next 5 survivors. Or move on.
+> 6. If you want to push further, re-invoke `/n8n:mutant-fix` for the next 5 survivors. Or move on.
 >
 > 7. Final verification:
->    `/n8n:mutation-test packages/workflow/src/<picked-file>`
+>    `/n8n:mutant-score packages/workflow/src/<picked-file>`
 >
 >    Capture the before/after score for the PR body.
 >
@ -55,7 +55,7 @@ Use this prompt to drive the strengthen-tests loop end-to-end and open a stacked
 > ```markdown
 > ## Summary
 >
-> Demo PR for #30956. Drives the `n8n:strengthen-tests` loop against `packages/workflow/src/<file>` to show the trial loop end-to-end.
+> Demo PR for #30956. Drives the `n8n:mutant-fix` loop against `packages/workflow/src/<file>` to show the trial loop end-to-end.
 >
 > **Before**: <X>% mutation score, <N> survivors
 > **After**:  <Y>% mutation score, <M> survivors
@ -65,7 +65,7 @@ Use this prompt to drive the strengthen-tests loop end-to-end and open a stacked
 > 2. ...
 >
 > ## Test plan
-> - [ ] `pnpm --filter=n8n-workflow mutate src/<file>` reproduces the post-score locally
+> - [ ] `pnpm mutate packages/workflow/src/<file>` reproduces the post-score locally
 > - [ ] `pnpm --filter=n8n-workflow test test/<file>.test.ts` passes
 > - [ ] Each new assertion has a clear "this would have caught X bug" justification
 > ```
--- a/scripts/mutation-health/README.md
+++ b/scripts/mutation-health/README.md
@ -54,9 +54,11 @@ That divergence is exactly why this project exists.
 | File | Purpose |
 | --- | --- |
 | `pick-next.mjs` | Walk `<pkg>/src/`, merge with the live ledger, return the next source file to mutate |
+| `mutate.mjs` | Run Stryker on one source file of any vitest package, write `summary.json` |
+| `stryker.default.mjs` | Default Stryker config for onboarded packages (points at the package's own `vitest.config.*`) |
 | `emit-payload.mjs` | Turn a Stryker `summary.json` into a BQ-ready writer payload |

-The Stryker run itself lives in `packages/workflow/scripts/mutate.mjs` and is invoked via `pnpm --filter=n8n-workflow mutate <src-file>`.
+`mutate.mjs` is package-agnostic — run `pnpm mutate <repo-relative-file>` from the repo root and the package is inferred from the path (or pass `--package-dir <pkg>` for a package-relative target, as the nightly does). It uses the package's own `stryker.config.mjs` if one exists (e.g. `packages/workflow` carves out the isolated-vm engine), otherwise `stryker.default.mjs`.

 The reader and writer webhooks are plain HTTP — the GHA hits them with `curl`. There is no fetch/post wrapper script; if you want to call them locally, see [Local usage](#local-usage).

@ -78,7 +80,7 @@ The BQ table schema lives with the writer workflow (in n8n's internal Quality pr
       │     within new:        alphabetical
       │     within red/stale:  lowest score first
       │
-       ├─► pnpm --filter=n8n-workflow mutate → summary.json
+       ├─► mutate.mjs --package-dir <pkg>   → summary.json
       │
       ├─► emit-payload.mjs                 → bq-payload.json
       │
@ -94,6 +96,17 @@ The BQ table schema lives with the writer workflow (in n8n's internal Quality pr

 The writer workflow lives in n8n's internal Quality project. It's created and maintained outside this repo. This README documents the contract it implements.

+## Passes, packages & onboarding
+
+The nightly runs a **matrix of `package × pass`** (built once in the `setup` job of `mutation-health-nightly.yml`). Each leg picks, mutates, and writes back independently; the ledger is keyed by package, so they don't collide. Two passes, selectable via the `mode` dispatch input (`both` on schedule):
+
+- **baseline** — `pick-next.mjs --mode baseline` → scores files with no result yet (the `new` bucket). Builds out coverage.
+- **coverage** — `pick-next.mjs --mode coverage` → revisits the weakest scored files (`red`/`stale`, lowest score first). Strengthens existing tests.
+
+To onboard a **vitest** package: add one `{ name, dir, slug }` line to the `packages` array in the `setup` job. No per-package config needed — `stryker.default.mjs` auto-resolves the package's own `vitest.config.*` (verified on plain and DI-decorator packages). Add a local `stryker.config.mjs` only if the package needs special handling.
+
+Not yet covered: **jest** packages (need Stryker's jest-runner — different setup) and `@n8n/expression-runtime` (it _is_ the isolated-vm engine; blocked on the patch in DEVP-257).
+
 ## State transitions

 | Trigger | Stored `status` |
@ -205,8 +218,10 @@ Runs use `STRYKER_THRESHOLD=80` as a placeholder. The threshold moves to evidenc
 ## Local usage

 ```bash
-# Run Stryker on one file (the inner loop — also invokable via /n8n:mutation-test skill)
-pnpm --filter=n8n-workflow mutate src/cron.ts
+# Run Stryker on one file (the inner loop — also invokable via /n8n:mutant-score skill).
+# Package is inferred from the repo-relative path; works for any vitest package.
+pnpm mutate packages/workflow/src/cron.ts
+pnpm mutate packages/@n8n/crdt/src/utils.ts

 # Pull current ledger from BQ
 curl --fail -sS \
--- a/packages/workflow/scripts/mutate.mjs
+++ b/packages/workflow/scripts/mutate.mjs
@ -1,11 +1,25 @@
 #!/usr/bin/env node
 /**
- * Run Stryker on a single source file and emit an actionable summary.
+ * Run Stryker on a single source file of a workspace package and emit an
+ * actionable summary. Package-agnostic: the nightly matrix and the per-package
+ * `mutate` npm scripts both call this one script.
 *
- * Usage:  pnpm --filter=n8n-workflow mutate <relative-path-under-src>
- * Example: pnpm --filter=n8n-workflow mutate src/cron.ts
+ * Usage (also exposed as `pnpm mutate <file>` from the repo root):
+ *   node scripts/mutation-health/mutate.mjs <file> [--package-dir <repo-rel-path>] [--config <path>]
 *
- * Outputs (under packages/workflow/reports/mutation/):
+ * The package is inferred from a repo-relative file path; pass --package-dir when
+ * the target is package-relative (the nightly does this).
+ *   node scripts/mutation-health/mutate.mjs packages/@n8n/crdt/src/utils.ts   # inferred
+ *   node scripts/mutation-health/mutate.mjs src/cron.ts --package-dir packages/workflow
+ *
+ * Stryker config resolution (first match wins):
+ *   1. --config <path>                         explicit override
+ *   2. <package-dir>/stryker.config.mjs        package-local (e.g. workflow's vm carve-out)
+ *   3. scripts/mutation-health/stryker.default.mjs   shared default (points at the
+ *                                              package's own vitest.config.* — no
+ *                                              bespoke vitest config required)
+ *
+ * Outputs (under <package-dir>/reports/mutation/):
 *   raw.json      — full Stryker Mutation Testing Elements report
 *   raw.html      — Stryker's HTML report (browse for human review)
 *   summary.json  — compact actionable summary (this script)
@ -18,13 +32,15 @@
 */

 import { spawn } from 'node:child_process';
-import { readFile, writeFile } from 'node:fs/promises';
+import { readFile, writeFile, mkdir } from 'node:fs/promises';
 import { existsSync } from 'node:fs';
+import { createRequire } from 'node:module';
 import { fileURLToPath } from 'node:url';
 import path from 'node:path';

+const require = createRequire(import.meta.url);
 const __dirname = path.dirname(fileURLToPath(import.meta.url));
-const pkgRoot = path.resolve(__dirname, '..');
+const repoRoot = path.resolve(__dirname, '../..');

 const THRESHOLD = Number(process.env.STRYKER_THRESHOLD ?? 80);

@ -33,31 +49,90 @@ function die(code, msg) {
 	process.exit(code);
 }

-const targetArg = process.argv[2];
-if (!targetArg) {
-	die(
-		2,
-		'Usage: pnpm --filter=n8n-workflow mutate <relative-path-under-src>\n' +
-			'Example: pnpm --filter=n8n-workflow mutate src/cron.ts',
-	);
+// --- args: one positional target + --package-dir (required) + --config (optional)
+const argv = process.argv.slice(2);
+let packageDirArg;
+let configArg;
+let targetArg;
+for (let i = 0; i < argv.length; i++) {
+	const a = argv[i];
+	if (a === '--package-dir') packageDirArg = argv[++i];
+	else if (a === '--config') configArg = argv[++i];
+	else if (!a.startsWith('--') && targetArg === undefined) targetArg = a;
+}
+
+const usage =
+	'Usage: node scripts/mutation-health/mutate.mjs <file> [--package-dir <repo-rel-path>] [--config <path>]\n' +
+	'  - repo-relative file → package is inferred: node scripts/mutation-health/mutate.mjs packages/@n8n/crdt/src/utils.ts\n' +
+	'  - package-relative file → pass --package-dir:  node scripts/mutation-health/mutate.mjs src/cron.ts --package-dir packages/workflow';
+
+if (!targetArg) die(2, `Missing mutate target.\n${usage}`);
+
+// Walk up from a path to the nearest enclosing package.json (bounded by repoRoot).
+function findPackageRoot(fromAbs) {
+	let dir = path.dirname(fromAbs);
+	while (dir === repoRoot || dir.startsWith(`${repoRoot}${path.sep}`)) {
+		if (existsSync(path.join(dir, 'package.json'))) return dir;
+		const parent = path.dirname(dir);
+		if (parent === dir) break;
+		dir = parent;
+	}
+	return null;
+}
+
+// Resolve pkgRoot + the src-relative target, supporting two call styles:
+//   1. --package-dir given → target is package-relative (or absolute). (the nightly's style)
+//   2. no --package-dir → target is a repo-relative file; infer the package from it.
+let pkgRoot;
+let target;
+if (packageDirArg) {
+	pkgRoot = path.resolve(repoRoot, packageDirArg);
+	if (!existsSync(pkgRoot)) die(2, `Package dir not found: ${pkgRoot}`);
+	target = path.isAbsolute(targetArg) ? path.relative(pkgRoot, targetArg) : targetArg;
+} else {
+	const abs = path.resolve(repoRoot, targetArg);
+	if (!existsSync(abs)) die(2, `Target not found: ${abs}\n${usage}`);
+	const found = findPackageRoot(abs);
+	if (!found)
+		die(2, `Could not infer the package for ${targetArg} — pass --package-dir.\n${usage}`);
+	pkgRoot = found;
+	target = path.relative(pkgRoot, abs);
 }

-const target = path.isAbsolute(targetArg) ? path.relative(pkgRoot, targetArg) : targetArg;
 if (!target.startsWith('src/') || target.includes('..')) {
-	die(2, `Target must be under src/ within this package. Got: ${target}`);
+	die(2, `Target must be under the package's src/. Got: ${target}`);
 }
 if (!existsSync(path.join(pkgRoot, target))) {
 	die(2, `Target not found: ${path.join(pkgRoot, target)}`);
 }
+const packageDir = path.relative(repoRoot, pkgRoot);
+
+// --- resolve the Stryker config: override → package-local → shared default
+const localConfig = path.join(pkgRoot, 'stryker.config.mjs');
+const defaultConfig = path.join(__dirname, 'stryker.default.mjs');
+const configPath = configArg
+	? path.resolve(repoRoot, configArg)
+	: existsSync(localConfig)
+		? localConfig
+		: defaultConfig;
+
+// --- resolve the Stryker binary from the hoisted store (works for any package)
+const strykerBin = path.join(
+	path.dirname(require.resolve('@stryker-mutator/core/package.json')),
+	'bin/stryker.js',
+);

 const reportDir = path.join(pkgRoot, 'reports/mutation');
 const rawJsonPath = path.join(reportDir, 'raw.json');
 const summaryJsonPath = path.join(reportDir, 'summary.json');
+await mkdir(reportDir, { recursive: true });

-process.stderr.write(`Running Stryker on ${target} (threshold: ${THRESHOLD}%)\n`);
+process.stderr.write(
+	`Running Stryker on ${packageDir}/${target} (config: ${path.relative(repoRoot, configPath)}, threshold: ${THRESHOLD}%)\n`,
+);

 await new Promise((resolve) => {
-	const child = spawn('node_modules/.bin/stryker', ['run', '--mutate', target], {
+	const child = spawn(process.execPath, [strykerBin, 'run', configPath, '--mutate', target], {
 		cwd: pkgRoot,
 		stdio: 'inherit',
 	});
--- a/scripts/mutation-health/pick-next.mjs
+++ b/scripts/mutation-health/pick-next.mjs
@ -22,10 +22,15 @@
 * Inputs:
 *   --package-dir <path>     Required. Repo-relative path to the package, e.g. packages/workflow
 *   --ledger-file <path>     Required. Live ledger JSON: { "ledger": [ ... ] }
+ *   --mode <baseline|coverage>  Optional. Restrict the picker to one bucket:
+ *                              baseline → only `new` (establish first scores)
+ *                              coverage → only `red`/`stale` (revisit weakest, lowest-first)
+ *                              omitted  → combined new → red → stale (default)
 *   --stale-after-weeks <n>  Optional. Default 4.
 *
 * Output (stdout): { picked: { source_file_path, package, prior_status, effective_status } }
- *                  OR { picked: null, reason: "all-green" | "empty-source-tree" }.
+ *                  OR { picked: null, reason: "all-green" | "empty-source-tree"
+ *                       | "no-new-files" | "nothing-below-threshold" }.
 *
 * Exit codes:
 *   0 — picked a row OR nothing to do (with picked: null sentinel)
@ -202,11 +207,31 @@ process.stderr.write(
 		`new=${counts.new ?? 0} red=${counts.red ?? 0} stale=${counts.stale ?? 0} green=${counts.green ?? 0}\n`,
 );

-const top = annotated[0];
+// --mode restricts the candidate set to one bucket; omitted = combined.
+const MODE_BUCKETS = {
+	baseline: new Set(['new']),
+	coverage: new Set(['red', 'stale']),
+};
+const mode = args.mode;
+if (mode !== undefined && !Object.hasOwn(MODE_BUCKETS, mode)) {
+	die(2, `Invalid --mode=${mode}. Use 'baseline' or 'coverage' (omit for combined new→red→stale).`);
+}
+const candidates = mode
+	? annotated.filter((r) => MODE_BUCKETS[mode].has(r.effective_status))
+	: annotated;

-if (top.effective_status === 'green') {
-	process.stderr.write(`All actionable rows green (stale threshold ${STALE_AFTER_WEEKS} weeks) — nothing to do.\n`);
-	process.stdout.write(JSON.stringify({ picked: null, reason: 'all-green' }) + '\n');
+const top = candidates[0];
+
+// Nothing to do: an empty mode-filtered set, or (combined mode) the best row is green.
+if (!top || (!mode && top.effective_status === 'green')) {
+	const reason =
+		mode === 'baseline'
+			? 'no-new-files'
+			: mode === 'coverage'
+				? 'nothing-below-threshold'
+				: 'all-green';
+	process.stderr.write(`Nothing to do for mode=${mode ?? 'combined'} (${reason}).\n`);
+	process.stdout.write(JSON.stringify({ picked: null, reason }) + '\n');
 	process.exit(0);
 }

--- a/scripts/mutation-health/stryker.default.mjs
+++ b/scripts/mutation-health/stryker.default.mjs
@ -0,0 +1,29 @@
+/**
+ * Default Stryker config for vitest packages onboarding to mutation health.
+ *
+ * Deliberately does NOT set `vitest.configFile` — the vitest-runner
+ * auto-resolves the package's own `vitest.config.*` from the run cwd (the
+ * package dir). That's the whole point: plain vitest packages (DI or not)
+ * need no bespoke vitest config. Packages that DO need special handling ship
+ * their own `stryker.config.mjs`, which mutate.mjs prefers over this default
+ * (e.g. packages/workflow carves out the isolated-vm engine — see DEVP-257).
+ *
+ * Reporter paths are relative to the run cwd, so reports land in
+ * <package-dir>/reports/mutation/ and mutate.mjs reads raw.json from there.
+ */
+/** @type {import('@stryker-mutator/api/core').PartialStrykerOptions} */
+export default {
+	packageManager: 'pnpm',
+	testRunner: 'vitest',
+	plugins: ['@stryker-mutator/vitest-runner'],
+	reporters: ['progress', 'clear-text', 'html', 'json'],
+	coverageAnalysis: 'perTest',
+	// Empty — mutate.mjs always passes --mutate <file>.
+	mutate: [],
+	htmlReporter: { fileName: 'reports/mutation/raw.html' },
+	jsonReporter: { fileName: 'reports/mutation/raw.json' },
+	timeoutMS: 60_000,
+	concurrency: Number(process.env.STRYKER_CONCURRENCY ?? 4),
+	tempDirName: '.stryker-tmp',
+	cleanTempDir: true,
+};