n8n/packages/@n8n/instance-ai/eval-pr-comment.md
Jaakko Husso a316742c92
Some checks are pending
Build: Benchmark Image / build (push) Waiting to run
CI: Master (Build, Test, Lint) / Build for Github Cache (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (22.x) (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (24.14.1) (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (25.x) (push) Waiting to run
CI: Master (Build, Test, Lint) / Lint (push) Waiting to run
CI: Master (Build, Test, Lint) / Performance (push) Waiting to run
CI: Master (Build, Test, Lint) / Notify Slack on failure (push) Blocked by required conditions
Util: Sync API Docs / sync-public-api (push) Waiting to run
fix(core): Gate web search tool use behind approval checks correctly (no-changelog) (#29685)
Co-authored-by: Albert Alises <albert.alises@gmail.com>
2026-05-07 11:06:51 +00:00

3.2 KiB
Raw Permalink Blame History

Instance AI Workflow Eval

Note

No baseline configured — comparison skipped. Run the eval with --experiment-name instance-ai-baseline on master to create one.

Aggregate: 8.0% pass (2/25 trials, 5 scenarios × N=5)

Per-test-case results (1)
Workflow Built pass@5 pass^5
cross-team-linear-report 2/5 20% 0%
Failure details

cross-team-linear-report/happy-path — 5 failed

Run [builder_issue]: The workflow fails at multiple levels. First, the 'Filter & Classify Cross-Team' code node produces zero output ('Output: none'), which causes all downstream nodes (Count Per Creator, Sort Descending, Run [builder_issue]: The workflow failed with the error 'Couldn't find the field crossTeamCount in the input data' in the Sort by Count (Desc) node. The root cause is in the Aggregate by Creator (Summarize) node configura Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed

cross-team-linear-report/multi-team-creator — 5 failed

Run [builder_issue]: The workflow execution stopped at the 'Filter & Classify Cross-Team' node, which produced no output. The code node contains two fatal flaws: (1) It tries to resolve the creator's email via `issue.crea Run [builder_issue]: The workflow failed at the 'Sort by Count (Desc)' node with the error 'Couldn't find the field crossTeamCount in the input data'. The root cause is a misconfiguration in the 'Aggregate by Creator' (Su Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed

cross-team-linear-report/no-cross-team-issues — 3 failed

Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed

cross-team-linear-report/unknown-creator — 5 failed

Run [builder_issue]: The workflow did not crash, so it handled the unknown creator (Dave) without crashing — that part is fine. However, Alice's cross-team issues were NOT correctly processed. The Filter & Classify Cross- Run [builder_issue]: The workflow crashed at the 'Sort by Count (Desc)' node with the error: "Couldn't find the field 'crossTeamCount' in the input data". This prevented the Slack post from being sent. While the 'Detect C Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed

cross-team-linear-report/api-error — 5 failed

Run [builder_issue]: The workflow crashed with an unhandled error: 'Cannot read properties of undefined (reading 'errors')'. The Get Linear Issues node received an authentication error response from the Linear API (mock r Run [builder_issue]: The workflow crashed with 'Authorization failed - please check your credentials' when the Linear API returned an authentication error. There is no error handling branch configured in the workflow — no Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed Run [build_failure]: Build failed: fetch failed