José Braulio González Valido
95cf41c37c
chore(core): Enable Daytona sandbox in Instance AI evals (no-changelog) ( #29931 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 07:43:04 +00:00
Albert Alises
869dc32c15
feat(ai-builder): Speeds up Instance AI eval by parallelizing iterations and trimming mock handler (no-changelog) ( #29839 )
2026-05-06 08:15:33 +00:00
José Braulio González Valido
ffef9c9c48
fix(core): Retry Phase 1 hint generation when triggerContent is empty (no-changelog) ( #29109 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 07:26:25 +00:00
Benjamin Schroth
c961849226
feat(ai-builder): Add sub-agent evaluation harness with binary checks (no-changelog) ( #28289 )
...
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 07:50:46 +00:00
José Braulio González Valido
89e9117d39
fix(ai-builder): Expose outputCount when eval node output is truncated (no-changelog) ( #28977 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:06:27 +00:00
José Braulio González Valido
ac922fa38c
feat(ai-builder): Improve eval verifier and mock handler reliability (no-changelog) ( #28255 )
...
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 13:57:32 +00:00
Arvin A
df8e795c3f
fix(core): Sanitize request data sent to LLM in eval mock handler (no-changelog) ( #28200 )
...
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:16:35 +00:00
José Braulio González Valido
fef91c97dd
feat(ai-builder): Add --keep-workflows flag and fix eval execution errors (no-changelog) ( #28129 )
...
Build: Benchmark Image / build (push) Waiting to run
CI: Master (Build, Test, Lint) / Build for Github Cache (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (22.x) (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (24.14.1) (push) Waiting to run
CI: Master (Build, Test, Lint) / Unit tests (25.x) (push) Waiting to run
CI: Master (Build, Test, Lint) / Lint (push) Waiting to run
CI: Master (Build, Test, Lint) / Performance (push) Waiting to run
CI: Master (Build, Test, Lint) / Notify Slack on failure (push) Blocked by required conditions
Util: Sync API Docs / sync-public-api (push) Waiting to run
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:35:04 +00:00
José Braulio González Valido
2383749980
feat(ai-builder): Workflow evaluation framework with LLM mock execution ( #27818 )
...
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Arvin A <51036481+DeveloperTheExplorer@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-04-07 13:31:16 +00:00