n8n/packages/@n8n/instance-ai/eval-pr-comment.md
Jaakko Husso a316742c92
Some checks are pending
Build: Benchmark Image / build (push) Waiting to run
CI: Master (Build, Test, Lint) / Build for Github Cache (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (22.x) (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (24.14.1) (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (25.x) (push) Waiting to run
CI: Master (Build, Test, Lint) / Lint (push) Waiting to run
CI: Master (Build, Test, Lint) / Performance (push) Waiting to run
CI: Master (Build, Test, Lint) / Notify Slack on failure (push) Blocked by required conditions
Util: Sync API Docs / sync-public-api (push) Waiting to run
fix(core): Gate web search tool use behind approval checks correctly (no-changelog) (#29685)
Co-authored-by: Albert Alises <albert.alises@gmail.com>
2026-05-07 11:06:51 +00:00

52 lines
3.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

### Instance AI Workflow Eval
> [!NOTE]
> No baseline configured — comparison skipped. Run the eval with `--experiment-name instance-ai-baseline` on master to create one.
**Aggregate**: 8.0% pass (2/25 trials, 5 scenarios × N=5)
<details><summary>Per-test-case results (1)</summary>
| Workflow | Built | pass@5 | pass^5 |
|---|---|---|---|
| `cross-team-linear-report` | 2/5 | 20% | 0% |
</details>
<details><summary>Failure details</summary>
**`cross-team-linear-report/happy-path`** — 5 failed
> Run [builder_issue]: The workflow fails at multiple levels. First, the 'Filter & Classify Cross-Team' code node produces zero output ('Output: none'), which causes all downstream nodes (Count Per Creator, Sort Descending,
> Run [builder_issue]: The workflow failed with the error 'Couldn't find the field crossTeamCount in the input data' in the Sort by Count (Desc) node. The root cause is in the Aggregate by Creator (Summarize) node configura
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
**`cross-team-linear-report/multi-team-creator`** — 5 failed
> Run [builder_issue]: The workflow execution stopped at the 'Filter & Classify Cross-Team' node, which produced no output. The code node contains two fatal flaws: (1) It tries to resolve the creator's email via `issue.crea
> Run [builder_issue]: The workflow failed at the 'Sort by Count (Desc)' node with the error 'Couldn't find the field crossTeamCount in the input data'. The root cause is a misconfiguration in the 'Aggregate by Creator' (Su
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
**`cross-team-linear-report/no-cross-team-issues`** — 3 failed
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
**`cross-team-linear-report/unknown-creator`** — 5 failed
> Run [builder_issue]: The workflow did not crash, so it handled the unknown creator (Dave) without crashing — that part is fine. However, Alice's cross-team issues were NOT correctly processed. The Filter & Classify Cross-
> Run [builder_issue]: The workflow crashed at the 'Sort by Count (Desc)' node with the error: "Couldn't find the field 'crossTeamCount' in the input data". This prevented the Slack post from being sent. While the 'Detect C
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
**`cross-team-linear-report/api-error`** — 5 failed
> Run [builder_issue]: The workflow crashed with an unhandled error: 'Cannot read properties of undefined (reading 'errors')'. The Get Linear Issues node received an authentication error response from the Linear API (mock r
> Run [builder_issue]: The workflow crashed with 'Authorization failed - please check your credentials' when the Linear API returned an authentication error. There is no error handling branch configured in the workflow — no
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
> Run [build_failure]: Build failed: fetch failed
</details>