mirror of https://github.com/n8n-io/n8n.git synced 2026-05-26 06:17:21 +02:00

History

Declan Carroll 3a33a448b0 Some checks failed CI: Master (Build, Test, Lint) / Build for Github Cache (push) Has been cancelled Details CI: Master (Build, Test, Lint) / Unit tests (22.x) (push) Has been cancelled Details CI: Master (Build, Test, Lint) / Unit tests (24.14.1) (push) Has been cancelled Details CI: Master (Build, Test, Lint) / Unit tests (25.x) (push) Has been cancelled Details CI: Master (Build, Test, Lint) / Lint (push) Has been cancelled Details CI: Master (Build, Test, Lint) / Performance (push) Has been cancelled Details CI: Master (Build, Test, Lint) / Notify Slack on failure (push) Has been cancelled Details Util: Update Node Popularity / update-popularity (push) Has been cancelled Details Test: E2E Coverage Weekly / Prepare Docker (coverage) (push) Has been cancelled Details Util: Update Node Popularity / approve-and-automerge (push) Has been cancelled Details Test: E2E Coverage Weekly / E2E (coverage) (push) Has been cancelled Details Test: E2E Coverage Weekly / Aggregate Coverage (push) Has been cancelled Details Release: Schedule Patch Release PRs / Create patch release PR (${{ matrix.track }}) (beta) (push) Has been cancelled Details Release: Schedule Patch Release PRs / Create patch release PR (${{ matrix.track }}) (stable) (push) Has been cancelled Details Release: Schedule Patch Release PRs / Create patch release PR (${{ matrix.track }}) (v1) (push) Has been cancelled Details test(benchmark): Question-driven Playwright benchmark suite with tiered topology and rich diagnostics (#29024 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-09 21:14:08 +00:00
..
harness	test(benchmark): Question-driven Playwright benchmark suite with tiered topology and rich diagnostics (#29024 )	2026-05-09 21:14:08 +00:00
kafka	test(benchmark): Question-driven Playwright benchmark suite with tiered topology and rich diagnostics (#29024 )	2026-05-09 21:14:08 +00:00
webhook	test(benchmark): Question-driven Playwright benchmark suite with tiered topology and rich diagnostics (#29024 )	2026-05-09 21:14:08 +00:00
README.md	test(benchmark): Question-driven Playwright benchmark suite with tiered topology and rich diagnostics (#29024 )	2026-05-09 21:14:08 +00:00

README.md

Benchmarks

Question-driven performance specs for n8n. Each spec answers ONE scaling question — its filename and describe() title state the question, and the assertions / printed metrics prove the answer.

Specs

Each spec self-declares its container topology via test.use({ capability: benchConfig(...) }) and runs in the single benchmarking:infrastructure Playwright project.

The suite is organised in three tiers — each tier asks a different kind of question:

Peak — `1m direct, no workers`

The architectural ceiling. No queue tax, no worker dispatch. What's the absolute max?

Trigger	Spec	Question
kafka	`single-instance-ceiling.spec.ts`	How much can we process on a single instance?
kafka	`steady-rate-breaking-point.spec.ts`	At what input rate does the system fall behind?
webhook	`webhook-single-instance.spec.ts`	What is the single-instance webhook ingestion ceiling?

Actual — `1m + 1w queue mode`

The real-world minimum HA topology. What does a basic production setup actually deliver?

Trigger	Spec	Question
kafka	`queue-mode-sustained-rate.spec.ts`	Can queue mode sustain 250 msg/s steady?
kafka	`burst-drain-capacity.spec.ts`	How fast can we drain a backlog?
kafka	`node-count-scaling.spec.ts`	How does throughput scale with workflow complexity?
kafka	`output-size-impact.spec.ts`	What is the impact of node output size on throughput?

Scaling — `2m + 2w queue mode`

HA distribution check. Does doubling capacity ~double the actual baseline?

Trigger	Spec	Question
webhook	`webhook-main-scaling.spec.ts`	Does webhook ingestion scale linearly with main count?

Cost — feature toggles on the actual baseline

What does turning on configuration X cost vs the baseline?

Trigger	Spec	Question
webhook	`webhook-otel-overhead.spec.ts`	What is the runtime cost of enabling OTEL?

Cost specs run the same workload as a baseline spec with one config knob flipped. Compare the exec/s/p50 of a Cost spec against its baseline from the same CI run to read the cost. OTEL specs also attach jaeger-traces.json as a test artifact — replay locally for flamegraph inspection.

Standard topology

Tier	Mains	Workers	Per-pod resources
Peak	1	0	4GB / 2 vCPU
Actual	1	1	main 4GB/2 vCPU, worker 2GB/1 vCPU
Scaling	2	2	main 4GB/2 vCPU, worker 2GB/1 vCPU
Cost	matches the baseline	matches the baseline	matches the baseline

All specs share a single env profile aligned with internal n8n production defaults — connection-pool, lock-duration, and Bull/Redis tuning from real deployments. See BENCHMARK_CONFIG in playwright-projects.ts.

Running

# Build n8n image first (skip if you only changed test code).
pnpm build:docker

# Full suite — all 9 specs sequentially (each spawns its own container).
pnpm --filter=n8n-playwright test:benchmark

# One spec.
pnpm --filter=n8n-playwright test:benchmark single-instance-ceiling

# By question.
pnpm --filter=n8n-playwright test:benchmark --grep "single instance"

Topology (mains/workers, kafka, custom env) is fixed per spec via benchConfig(...) in the spec file. To explore a different topology, edit the spec — there are no env overrides.

Useful env overrides

Variable	Default	Effect
`N8N_CONTAINERS_KEEPALIVE`	unset	Keep containers alive after the run for debugging

Reading the results

Every run prints a per-test [DIAG] block and emits a Benchmark Summary table at the end of the run (also surfaced in GitHub Actions job summaries):

│ Trigger │ Suite │ Scenario                           │ exec/s │ tail/s │ p50   │ p99    │ req/s │ ev lag │ pg tx/s │
├─────────┼───────┼────────────────────────────────────┼────────┼────────┼───────┼────────┼───────┼────────┼─────────┤
│ kafka   │ other │ Kafka trigger + 1 noop, 1KB, 150k  │ 1336.0 │ 1391.5 │ —     │ —      │ —     │ 18ms   │ 11430   │
│ webhook │ other │ Async webhook + 1 noop, 1KB, 250c  │  442.0 │  453.2 │ 558ms │ 674ms  │ 442.0 │ 8ms    │ 6692    │

Column	Meaning
`exec/s`	Workflow executions per second across the active window
`tail/s`	Throughput across the final 60s of the run — closest to the architectural ceiling
`actions/s`	`exec/s × nodeCount` — total node executions per second
`p50/p99`	Per-execution duration percentiles (when execution data is saved)
`req/s`	HTTP requests per second (webhook specs only)
`ev lag`	Node.js event loop lag (sum across mains/workers)
`pg tx/s`	Postgres `xact_commit` rate from postgres-exporter
`queue`	Bull jobs waiting (queue specs only)

For deeper PG analysis, every spec also logs a top-N pg_stat_statements breakdown ranked by total ms/s of work (calls/s × avg ms), plus a [PG SATURATION] block (total query CPU including planner overhead and the long tail, buffer hit ratio, bgwriter / WAL pressure, pg_stat_io per-backend-type IO) and a [CONTAINERS] block (per-container CPU/memory/IO from cAdvisor or docker stats sampler). Each run also attaches a run-report.json artifact with the full structured report — feedable directly to an LLM for bottleneck analysis.

CI

The full suite runs on blacksmith-8vcpu-ubuntu-2204 runners via .github/workflows/test-e2e-infrastructure-reusable.yml. One container at a time (workers: 1); each spec brings its own topology.

Architecture

Spec files (kafka/*.spec.ts, webhook/*.spec.ts)   ← question + topology + scenario
    ↓ uses
Harnesses (harness/*.ts)                          ← setup → load → measure → report
    ↓ orchestrates
TriggerDriver / setupWebhook                      ← trigger-specific load production
    ↓ uses
Shared building blocks                            ← workflow-builder, throughput-measure,
                                                    diagnostics, load-executors

Concern	Location
Topology / env	`playwright-projects.ts` (`BENCHMARK_CONFIG`, `benchConfig()`)
Workflow shape	`utils/benchmark/workflow-builder.ts`
Load patterns	`utils/benchmark/load-executors.ts` (preloaded, steady, staged)
Throughput math	`utils/benchmark/throughput-measure.ts`
Diagnostics	`utils/benchmark/diagnostics.ts`, `harness/orchestration.ts`

Adding a new trigger type requires one driver + one or more spec files. The harnesses, measurement, and reporting are trigger-agnostic.

Adding a spec

Pick a question that isn't already answered by an existing spec.
Decide which tier it belongs to: Peak (no workers), Actual (1m+1w), or Scaling (2m+2w).
Create kafka/<question>.spec.ts or webhook/<question>.spec.ts.
Use test.use({ capability: benchConfig('<slug>', { ... }) }) with the topology for that tier:
- Peak kafka: benchConfig('<slug>', { kafka: true })
- Actual kafka: benchConfig('<slug>', { kafka: true, workers: 1 })
- Actual webhook: benchConfig('<slug>', { workers: 1 })
- Scaling: benchConfig('<slug>', { mains: 2, workers: 2 }) (kafka adds kafka: true)
Wire the trigger driver (kafkaDriver or setupWebhook) and a harness (runLoadTest or runWebhookThroughputTest).
Annotate with { type: 'question', description: '<slug>' } so the question is searchable in test metadata.

README.md Unescape Escape