5.0 KiB
Custom Test Orchestration
Duration-based test distribution across CI shards, using committed metrics for deterministic runs.
Overview
Instead of Playwright's built-in sharding (which distributes by file count), this approach distributes specs by estimated duration using a greedy bin-packing algorithm. This results in more balanced shard execution times.
Key properties:
- Deterministic - Same commit always produces same distribution
- Re-run safe - Failed jobs re-run the same specs (no full suite re-run)
- Community PR friendly - No secrets needed at runtime
- Source agnostic - Metrics can come from any test analytics provider
Scripts
Located in packages/testing/playwright/scripts/:
| Script | Purpose |
|---|---|
distribute-tests.mjs |
Assigns specs to shards using bin-packing |
fetch-currents-metrics.mjs |
Fetches duration data from Currents API |
Metrics File
Committed at .github/test-metrics/playwright.json
Schema
{
"updatedAt": "2025-01-15T00:00:00.000Z",
"source": "currents | playwright | manual | <provider>",
"specs": {
"tests/e2e/path/to/spec.ts": {
"avgDuration": 45000,
"testCount": 5,
"flakyRate": 0.02
}
}
}
| Field | Type | Description |
|---|---|---|
updatedAt |
ISO 8601 | When metrics were last refreshed |
source |
string | Where metrics originated |
specs |
object | Map of spec path to metrics |
specs[path].avgDuration |
number | Average duration in milliseconds |
specs[path].testCount |
number | Number of tests in spec |
specs[path].flakyRate |
number | Flakiness rate (0-1) |
Only avgDuration is required for distribution. Other fields are informational.
CI Usage
Enabling Custom Orchestration
In the workflow call to playwright-test-reusable.yml:
jobs:
e2e-tests:
uses: ./.github/workflows/playwright-test-reusable.yml
with:
test-command: pnpm --filter=n8n-playwright test:local
shards: 8
use-custom-orchestration: true # Enable duration-based distribution
secrets: inherit
How It Works
When use-custom-orchestration: true:
# Each shard runs:
SPECS=$(node packages/testing/playwright/scripts/distribute-tests.mjs $TOTAL_SHARDS $SHARD_INDEX)
pnpm test:local --workers=2 $SPECS
The distribute script:
- Reads committed metrics from
.github/test-metrics/playwright.json - Sorts specs by duration (descending)
- Assigns each spec to the lightest shard (greedy bin-packing)
- Outputs space-separated spec paths for the requested shard
Distribution Algorithm
Input: specs sorted by duration [100s, 80s, 60s, 40s, 30s, 20s]
Shards: 3
Step 1: 100s → Shard 0 (lightest) → [100, 0, 0]
Step 2: 80s → Shard 1 (lightest) → [100, 80, 0]
Step 3: 60s → Shard 2 (lightest) → [100, 80, 60]
Step 4: 40s → Shard 2 (lightest) → [100, 80, 100]
Step 5: 30s → Shard 1 (lightest) → [100, 110, 100]
Step 6: 20s → Shard 0 (lightest) → [120, 110, 100]
Result: Balanced ~110s per shard instead of uneven distribution
Refreshing Metrics
Using Currents API
CURRENTS_API_KEY=<key> node packages/testing/playwright/scripts/fetch-currents-metrics.mjs --project=<id>
The script:
- Fetches test durations from Currents API (last 30 days)
- Aggregates by spec file
- Validates against
pnpm playwright test --list --project="standard:e2e" - Reports drift (stale specs removed, new specs added with 30s default)
- Writes to
.github/test-metrics/playwright.json
Using Other Sources
Create a script that outputs the same JSON schema. The distribution only requires:
{
"specs": {
"tests/e2e/example.spec.ts": { "avgDuration": 45000 }
}
}
When to Refresh
- When new spec files are added (they get 30s default until refreshed)
- When specs are deleted/renamed (stale entries are filtered out)
- Periodically to capture duration changes (weekly recommended)
Maintenance
Detecting Drift
Run the fetch script - it reports mismatches:
Stale specs (in metrics but not Playwright):
- tests/e2e/deleted/old.spec.ts
New specs (in Playwright but not metrics, using 30s default):
- tests/e2e/new/feature.spec.ts
Manual Metrics Entry
For new specs before CI data exists:
{
"specs": {
"tests/e2e/new/feature.spec.ts": {
"avgDuration": 45000,
"testCount": 3,
"flakyRate": 0
}
}
}
Estimate duration based on similar specs, or use 30000 (30s) as default.
Troubleshooting
Specs not running
Check that the spec path in playwright.json matches exactly what Playwright outputs:
pnpm --filter=n8n-playwright playwright test --list --project="standard:e2e"
Unbalanced shards
Refresh metrics - durations may have changed significantly since last update.
Script errors
# Test distribution locally
node packages/testing/playwright/scripts/distribute-tests.mjs 8 0
# Validate metrics file
cat .github/test-metrics/playwright.json | jq '.specs | length'