From f8900907e436caccf7ef3f59c04794b7856cd3b0 Mon Sep 17 00:00:00 2001
From: Oleg Ivaniv <me@olegivaniv.com>
Date: Wed, 6 May 2026 13:24:51 +0200
Subject: [PATCH] Update specs

---
 packages/@n8n/instance-ai/TRACING_SPECS.md | 1099 ++++++++++----------
 1 file changed, 561 insertions(+), 538 deletions(-)

diff --git a/packages/@n8n/instance-ai/TRACING_SPECS.md b/packages/@n8n/instance-ai/TRACING_SPECS.md
index 973a5766c17..c02135ffa2f 100644
--- a/packages/@n8n/instance-ai/TRACING_SPECS.md
+++ b/packages/@n8n/instance-ai/TRACING_SPECS.md
@@ -1,658 +1,681 @@
 # Instance AI Tracing Specs
 
-Status: planning OTel-canonical rewrite
+Status: planning clean Instance AI tracing rewrite
 
-Last updated: 2026-05-05
+Last updated: 2026-05-06
 
 ## Decision
 
-Instance AI live tracing will use OpenTelemetry as the single canonical trace
-model.
+Instance AI tracing will be rebuilt around a clean, activation-scoped
+OpenTelemetry model owned by `@n8n/instance-ai`.
 
-We will stop mixing LangSmith `RunTree` spans with OTel spans for normal
-execution. Product concepts such as message turns, orchestrator work,
-sub-agent work, HITL, and background jobs must be represented as OTel spans.
-Native AI SDK spans for model calls, provider requests, messages, tool calls,
-and token usage stay in the same OTel tree.
+LangSmith is an export target for Instance AI traces, not the internal tracing
+model. LangSmith-specific attribute names, token workarounds, thread grouping
+policy, and trace-shaping rules must stay in `packages/@n8n/instance-ai`.
+`packages/@n8n/agents` must remain provider/platform neutral, except for
+generic telemetry primitives that any product can use.
 
-`RunTree` should not be used as a live trace hierarchy once this migration is
-complete. If LangSmith feedback or legacy replay needs a compatibility path,
-that path must be isolated and must not parent or reshape OTel spans.
+The target live trace hierarchy has no LangSmith `RunTree` spans. Native AI SDK
+model/tool telemetry and Instance AI product spans must share the same OTel
+context tree.
 
-## Why
+## Design Principles
 
-The hybrid implementation proved useful but has the wrong shape:
+1. **Activation duration, not logical task lifetime**
 
-- RunTree spans show Instance AI product semantics.
-- OTel spans show the real AI SDK and provider semantics.
-- LangSmith does not reliably roll up token usage or child spans across those
-  two ingestion models.
-- Attempts to force-parent native OTel spans under RunTree spans caused native
-  LLM spans to disappear.
+   A span measures active work. It must not stay open while the orchestrator is
+   waiting for user approval, a background task, or a future scheduler tick.
 
-The correct model is to make product tracing use the same OTel tracer and
-context as native LLM telemetry.
+2. **One thread, multiple root operations**
 
-## Goals
+   A LangSmith thread is a chronological group of root traces linked by
+   `thread_id`. A complex Instance AI request can therefore contain:
 
-1. A foreground message turn has one canonical OTel trace tree.
-2. Inline work, including orchestrator LLM calls and inline planner calls, is
-   parented by OTel context propagation.
-3. Detached/background sub-agents may remain separate OTel trace roots, but
-   they must carry thread and spawning metadata.
-4. LangSmith shows real AI SDK LLM/provider spans with messages, tool schemas,
-   tool choice, tool calls, response summaries, timings, cost, and token usage.
-5. Product spans remain visible for Instance AI concepts that providers do not
-   know about: message turns, context compaction, prompt building, HITL,
-   workflow build orchestration, workspace edits, validation, replay
-   boundaries, and background jobs.
-6. Feedback remains attachable to a stable persisted trace/run identifier.
-7. Direct LangSmith credentials and AI service proxy routing both keep working.
-8. Trace replay remains deterministic and independent from LangSmith run shape.
-9. Redaction and input/output recording controls are explicit and enforced
-   before detailed payloads are exported.
+   - a user message activation,
+   - one or more detached background task activations,
+   - one or more orchestrator resume/checkpoint activations.
+
+3. **Inline sub-agents are children, detached sub-agents are roots**
+
+   A planner or delegate that runs synchronously inside the current orchestrator
+   activation is a child span. A builder, data-table, research, or delegate task
+   that runs after the orchestrator returns is a separate root trace linked back
+   with spawning metadata.
+
+4. **Leaf LLM spans own token usage**
+
+   Product spans must not carry prompt/completion usage that duplicates child LLM
+   spans. Token and cost rollups should come from native AI SDK provider spans,
+   with Instance AI applying LangSmith export fixes only in the Instance AI
+   LangSmith adapter.
+
+5. **Tool definitions are both agent and LLM request metadata**
+
+   We need to know which tools were assigned to which agent. That belongs on the
+   agent activation span as a compact tool manifest. We also need to know which
+   exact tools were available to each model call. That belongs on every LLM span
+   that sends tools to the provider, using the provider-facing tool definitions
+   from that request. Individual tool executions remain native `ai.toolCall`
+   spans.
+
+6. **Trace replay is separate from observability**
+
+   Replay records deterministic Instance AI events and tool I/O. It must not
+   depend on LangSmith IDs, OTel span IDs, or the shape of the LangSmith UI.
+
+7. **Do not hide provider semantics**
+
+   The LangSmith trace should show native LLM inputs, messages, available tools,
+   tool calls, provider metadata, finish reasons, token usage, and cache usage
+   whenever recording policy allows it.
 
 ## Non-Goals
 
-- Preserve the Mastra-era LangSmith tree shape.
-- Preserve the temporary hybrid RunTree plus OTel shape.
-- Rebuild provider-level LLM traces manually.
-- Roll product analytics or audit logging into LangSmith tracing.
-- Store full workflow JSON, file contents, credentials, or decrypted node
-  parameters in tracing payloads by default.
+- Preserve the Mastra-era trace tree.
+- Preserve the current hybrid `RunTree` plus OTel implementation.
+- Keep a compatibility layer that manually reconstructs LLM steps in LangSmith.
+- Make `@n8n/agents` aware of Instance AI thread IDs, agent roles, task IDs, or
+  LangSmith-specific token correction.
+- Use LangSmith trace shape as product state.
+- Keep orchestration spans open during background waits.
 
-## Current State
+## Current Instance AI Execution Inventory
 
-Implemented so far:
+The tracing rewrite must cover every current model operation and non-model
+execution path below.
 
-- `@n8n/agents` can build LangSmith OTel telemetry.
-- Native AI SDK spans are visible in LangSmith when not forced into RunTree
-  parentage.
-- `@n8n/agents` creates runtime root spans around generate/stream loops.
-- `@n8n/agents` emits AI-SDK-compatible `ai.toolCall` spans for local tool
-  execution.
-- Instance AI can create native telemetry with thread metadata and service
-  proxy headers.
-- Normal Instance AI execution no longer disables AI SDK telemetry.
-- The LangSmith OTel processor filters noisy AI SDK wrapper spans so provider
-  request spans such as `ai.streamText.doStream` can appear directly under the
-  agent root.
-- Instance AI product roots and child spans now use the same OTel
-  tracer/provider as native agent telemetry in normal execution.
-- Normal foreground and detached trace creation no longer creates RunTree spans.
-- Agent tree snapshots persist OTel trace/span IDs alongside derived LangSmith
-  IDs for feedback anchoring.
-- Instance AI disables the generic `@n8n/agents` runtime root span because the
-  product actor span already represents the agent loop; native provider and
-  `ai.toolCall` spans remain enabled and are parented directly under the
-  product actor span.
-- Normal tool execution no longer emits duplicate `instance-ai.tool.*` product
-  spans. The native `ai.toolCall` span is the canonical tool execution span;
-  Instance AI only adds product spans for HITL suspend/resume lifecycle events.
-- Live LangSmith validation has proved feedback against an OTel-only product
-  root and full provider-span visibility with a real model turn.
-- Detached sub-agent linking captures spawning trace/span metadata and model
-  tool-call IDs when a detached task is spawned from a local tool handler.
+| Area | Current code path | Model operation | Target trace shape |
+| --- | --- | --- | --- |
+| Foreground orchestrator | `createInstanceAgent()` -> `streamAgentRun()` | `Agent.stream()` | Child agent activation under `message_turn` or `orchestrator_resume` root |
+| Context compaction | `generateCompactionSummary()` | `Agent.generate()` with no tools | Root-level child under `message_turn` or `orchestrator_resume`, before `agent.orchestrator`; internal root only when run out of band |
+| Thread title | `generateTitleForRun()` | `generateTitleFromMessage()` | Internal OTel operation; export to LangSmith only when `include_internal=true` or on error |
+| Inline planner | `plan` tool / `createPlanWithAgentTool()` | `Agent.stream()` with planner tools | Child `agent.planner` activation under the current orchestrator activation |
+| Inline delegate | `delegate` tool / `createDelegateTool()` | `createSubAgent().stream()` | Child `agent.<role>` activation under the current orchestrator activation |
+| Browser credential setup | `browser-credential-setup` tool | `Agent.stream()` plus resume/nudge loops | Quick credential checks stay inline; browser/user-wait flows use detached `background_subagent` plus `orchestrator_resume` |
+| Detached builder | `build-workflow-with-agent` / planned build task | `Agent.stream()` in sandbox or tool mode | Separate `background_subagent` root linked to the spawn tool call |
+| Detached data-table agent | `manage-data-tables-with-agent` / planned task | `Agent.stream()` | Separate `background_subagent` root |
+| Detached research agent | `research-with-agent` / planned task | `Agent.stream()` | Separate `background_subagent` root |
+| Detached custom delegate | planned delegate task | `createSubAgent().stream()` | Separate `background_subagent` root |
+| Planned checkpoint | service-created follow-up turn | Orchestrator `Agent.stream()` | Separate `orchestrator_resume` root with `resume_reason=planned_checkpoint` |
+| Background completion handoff | service-created follow-up turn | Orchestrator `Agent.stream()` | Separate `orchestrator_resume` root with `resume_reason=background_task_completed` |
+| Workflow loop | `WorkflowLoopRuntime`, `verify-built-workflow`, `report-verification-verdict` | Mostly deterministic tools | Tool spans and product state spans only, no LLM span unless orchestrator chooses one |
+| Builder memory compaction | `compactBuilderMemoryThread()` | Currently deterministic storage compaction | Product span only if useful; if it later calls an LLM, trace as an internal child of builder root |
 
-Remaining follow-up:
-
-- Some fallback RunTree compatibility code remains for legacy/manual stream
-  trace debugging only. It is disabled by default behind
-  `N8N_INSTANCE_AI_LEGACY_RUNTREE_TRACING=true` and should be removed in the
-  post-rollout cleanup once no legacy stream-hook consumers remain.
-
-## Hybrid Reference Notes
-
-The last working hybrid traces showed RunTree product nodes such as
-`message_turn`, `orchestrator`, `context_compaction`, `prompt_build`, and
-`subagent:planner` beside native OTel nodes such as `ai.streamText.doStream`.
-This proved product semantics and native AI SDK telemetry could both be
-exported, but LangSmith displayed them as split turn/root groups and did not
-roll token usage up to the product roots.
-
-The failure mode to avoid is forcing native OTel spans under RunTree IDs. In
-that shape, LangSmith can lose or separate provider spans, and the trace no
-longer shows the complete system/user/tool/provider turn under a single OTel
-context. Regression coverage now asserts normal Instance AI trace creation does
-not create RunTree spans.
-
-## Live Validation Notes
-
-Live validation with explicit credentials has covered two cases:
-
-- `instance-ai-tracing-validation` thread
-  `otel-validation-f97e5f00-589a-49fb-a536-d54d417c30eb` proved foreground
-  and detached roots are queryable by the same thread ID. The thread contained
-  `instance-ai.message_turn`, native `ai.generateText.doGenerate` spans with
-  tool definitions and token usage, a local tool span, and detached
-  `instance-ai.subagent.workflow-builder` metadata with spawning trace/span
-  IDs.
-- `instance-ai-tracing-validation` thread
-  `otel-runtime-root-validation-4c6d9b3c-ae3f-454a-bf97-d984be36a2be` proved
-  the generic `@n8n/agents` runtime root span can be suppressed for Instance
-  AI. The resulting foreground tree was
-  `instance-ai.message_turn -> instance-ai.orchestrator.stream ->
-  ai.generateText.doGenerate/add_numbers`, with no duplicate
-  `instance-ai.orchestrator.stream` wrapper.
-
-User-provided fresh run `81b3a657-c452-484f-ac3c-122836016094` confirmed the
-pre-suppression implementation had correct native provider/tool visibility,
-token usage, detached sub-agent roots, and spawning metadata, but still showed
-duplicate same-named agent wrapper spans. The runtime-root suppression is the
-follow-up fix for that shape issue.
-
-## Target Architecture
+## Target Trace Model
 
 ```mermaid
 flowchart TD
-	A[HTTP or stream request] --> B[OTel product root: instance-ai.message_turn]
-	B --> C[context_compaction]
-	B --> D[prompt_build]
-	B --> E[instance-ai.orchestrator.stream]
-	E --> F[ai.streamText.doStream]
-	E --> G[ai.toolCall: credentials]
-	E --> H[ai.toolCall: plan]
-	H --> I[instance-ai.subagent.planner.stream]
-	I --> J[ai.streamText.doStream]
-	I --> K[ai.toolCall: submit-plan]
-	K --> L[hitl.suspend]
-	B --> M[message_turn persisted snapshot]
-	E --> N[detached workflow-builder spawn]
-	N --> O[separate OTel root: instance-ai.subagent.workflow-builder]
+    Thread[LangSmith thread_id]
+
+    Thread --> A[Root: instance-ai.message_turn]
+    A --> A2[context_compaction]
+    A --> A3[prompt_build]
+    A --> A1[agent.orchestrator]
+    A1 --> A4[LLM provider span]
+    A1 --> A5[tool: plan]
+    A5 --> A6[agent.planner]
+    A6 --> A7[LLM provider spans]
+    A6 --> A8[tool: submit-plan]
+    A1 --> A9[background spawn metadata]
+
+    Thread --> B[Root: instance-ai.background_subagent]
+    B --> B1[agent.workflow-builder]
+    B1 --> B2[LLM provider spans]
+    B1 --> B3[workspace and workflow tools]
+
+    Thread --> C[Root: instance-ai.orchestrator_resume]
+    C --> C1[agent.orchestrator]
+    C1 --> C2[tool: verify-built-workflow]
+    C1 --> C3[tool: complete-checkpoint]
 ```
 
-The foreground trace is rooted by `instance-ai.message_turn`. The root span is
-created before context compaction, prompt building, and the first LLM call.
+### Root Trace Kinds
 
-Inline sub-agents inherit the active OTel context and remain inside the
-foreground trace.
+`message_turn`
 
-Detached/background sub-agents can be separate OTel trace roots. They are not
-children in the live OTel tree if they run outside the request lifecycle, but
-they must be queryable by the same thread ID and linked to the spawning span via
-metadata.
+- Triggered by a user chat message.
+- Contains active foreground orchestrator work for that message.
+- Ends as `completed`, `failed`, `cancelled`, or `suspended`.
+- If it schedules background tasks and returns, it ends immediately after the
+  scheduling result is persisted and emitted.
 
-## Canonical Trace Shapes
+`orchestrator_resume`
 
-Foreground message turn:
+- Triggered by a tool approval, plan approval, background-task completion,
+  planned checkpoint, replan, or correction handoff.
+- Contains only active continuation work.
+- Does not inherit the duration of the suspended or background operation that
+  caused it.
 
-```text
-instance-ai.message_turn                 chain
-  instance-ai.context_compaction         chain
-  instance-ai.prompt_build               chain
-  instance-ai.orchestrator.stream        chain
-    ai.streamText.doStream               llm
-    ai.toolCall                          tool  credentials
-    ai.streamText.doStream               llm
-    ai.toolCall                          tool  plan
-      instance-ai.subagent.planner.stream chain
-        ai.streamText.doStream           llm
-        ai.toolCall                      tool  templates
-        ai.toolCall                      tool  credentials
-        ai.streamText.doStream           llm
-        ai.toolCall                      tool  submit-plan
-          instance-ai.hitl.suspend       chain/tool
-```
+`background_subagent`
 
-Resume after HITL, same process:
+- Triggered when a background task actually starts executing, not when the
+  orchestrator merely requests it.
+- Used by workflow builder, data-table manager, researcher, and detached
+  delegate workers.
+- Linked to the spawning activation by metadata, not by OTel parentage.
 
-```text
-instance-ai.message_turn
-  instance-ai.orchestrator.resume
-    ai.toolCall                          tool  submit-plan:resume
-    ai.streamText.doStream               llm
-```
+`internal_operation`
 
-Resume after HITL, different process:
+- Used for optional internal LLM calls that are not part of a user-visible
+  agent activation, such as title generation.
+- Hidden from normal debugging views by tags/metadata unless explicitly enabled.
 
-```text
-instance-ai.message_turn.resume          chain
-  instance-ai.orchestrator.resume        chain
-    ai.toolCall                          tool  submit-plan:resume
-    ai.streamText.doStream               llm
-```
+### Agent Activation Spans
 
-The resumed root must include `resumed_from_trace_id`,
-`resumed_from_span_id`, `message_group_id`, and `pending_tool_call_id`.
+Agent activation spans describe the Instance AI actor that is running. They are
+product spans, not model spans.
 
-Detached workflow builder:
+Recommended names:
 
-```text
-instance-ai.subagent.workflow-builder    chain
-  ai.streamText.doStream                 llm
-  ai.toolCall                            tool  credentials
-  ai.toolCall                            tool  workspace_write_file
-  ai.toolCall                            tool  workspace_execute_command
-  ai.toolCall                            tool  workspace_str_replace_file
-  ai.toolCall                            tool  submit-workflow
-  ai.toolCall                            tool  executions
-```
+- `instance-ai.agent.orchestrator`
+- `instance-ai.agent.planner`
+- `instance-ai.agent.workflow-builder`
+- `instance-ai.agent.data-table-manager`
+- `instance-ai.agent.web-researcher`
+- `instance-ai.agent.delegate`
+- `instance-ai.agent.credential-setup-browser-agent`
 
-## RunTree Policy
+Each agent activation span must include:
 
-RunTree is not part of the target live trace hierarchy.
+- agent role and agent id,
+- model id,
+- execution mode,
+- max iteration budget,
+- assigned tool manifest,
+- memory/checkpoint summary,
+- prompt section summary.
 
-Rules:
+The full system prompt can be recorded only when recording policy allows it.
+The compact tool manifest should always be safe to record after schema
+redaction.
 
-- Do not create RunTree spans for `message_turn`, `orchestrator`,
-  `subagent:*`, `tool:*`, or `hitl:*` in normal execution.
-- Do not set OTel span parents to RunTree run IDs.
-- Do not set RunTree trace IDs on native OTel spans to merge two ingestion
-  models.
-- Do not use RunTree ordering as replay input.
-- Do not wrap local tools in RunTree just to make them visible in LangSmith.
+### Native AI SDK Spans
 
-Allowed compatibility uses:
-
-- A temporary feedback adapter may query or derive the OTel product root run ID
-  for `Client.createFeedback`.
-- A temporary migration flag may emit both systems for comparison, but the
-  dual-emission mode must be disabled by default and must not be the production
-  target.
-
-## Feedback Anchoring
-
-Instance AI still needs stable IDs for feedback and persisted snapshots.
-
-Target snapshot fields:
-
-- `traceId`: OTel trace ID
-- `spanId`: OTel span ID for `instance-ai.message_turn`
-- `langsmithTraceId`: LangSmith trace UUID, if known or derivable
-- `langsmithRunId`: LangSmith run UUID for the product root, if known or
-  derivable
-- `threadId`
-- `messageId`
-- `messageGroupId`
-- `agentRunId`
-
-Implementation options, in preferred order:
-
-1. Create the OTel product root with explicit LangSmith IDs using supported
-   LangSmith OTel attributes, then persist those IDs.
-2. Persist the OTel trace/span IDs and derive the LangSmith IDs using the same
-   conversion LangSmith applies during OTel ingestion.
-3. Persist OTel IDs and resolve the LangSmith run by metadata lookup before
-   creating feedback.
-
-The migration must prove feedback against a live OTel-only trace before RunTree
-feedback anchoring is removed.
-
-## Metadata Contract
-
-Every Instance AI OTel root and child span must include the stable thread
-metadata needed for LangSmith thread views and debugging.
-
-Required metadata attributes:
-
-- `langsmith.metadata.thread_id`
-- `thread_id`
-- `conversation_id`
-- `message_id`
-- `message_group_id`
-- `run_id`
-- `agent_role`
-- `execution_mode`
-- `instance_ai.trace_version = otel-v2`
-
-Optional metadata attributes:
-
-- `workflow_id`
-- `project_id`
-- `user_id`, when safe to store
-- `agent_id`
-- `task_id`
-- `planned_task_id`
-- `work_item_id`
-
-Detached sub-agent metadata:
-
-- `spawned_by_trace_id`
-- `spawned_by_span_id`
-- `spawned_by_run_id`, if available
-- `spawned_by_agent_role`
-- `spawned_by_tool_call_id`
-- `subagent_role`
-- `subagent_task_id`
-
-Resume metadata:
-
-- `resumed_from_trace_id`
-- `resumed_from_span_id`
-- `resumed_from_run_id`, if available
-- `pending_tool_call_id`
-- `resume_action`
-
-The metadata builder belongs in Instance AI. The generic attribute conversion
-and redaction helpers can move to `@n8n/ai-utilities` if they are not
-LangSmith-specific.
-
-## Span Naming
-
-Use stable, searchable names.
-
-Product chain spans:
-
-- `instance-ai.message_turn`
-- `instance-ai.message_turn.resume`
-- `instance-ai.context_compaction`
-- `instance-ai.prompt_build`
-- `instance-ai.orchestrator.stream`
-- `instance-ai.orchestrator.generate`
-- `instance-ai.orchestrator.resume`
-- `instance-ai.subagent.<role>.stream`
-- `instance-ai.subagent.<role>.generate`
-- `instance-ai.background.<kind>`
-
-Product side-effect spans:
-
-- `instance-ai.hitl.suspend`
-- `instance-ai.hitl.resume`
-
-Native AI SDK spans:
+Native model and tool spans should be kept as close as possible to what
+`@n8n/agents` and AI SDK produce:
 
 - `ai.streamText.doStream`
 - `ai.generateText.doGenerate`
 - `ai.toolCall`
 
-The noisy AI SDK wrapper spans such as `ai.streamText` may be filtered from
-LangSmith export as long as provider request spans, tool spans, and product
-root spans remain correctly parented.
+Every LLM request span must include the exact tool definitions sent to the
+model for that request when tools are present. This must not rely on the parent
+agent activation span alone, because LangSmith renders available tool specs from
+the LLM run itself.
 
-For Instance AI, the generic `@n8n/agents` runtime root span around
-generate/stream loops is disabled. Generic agents may still use that span, but
-Instance AI already has explicit product actor spans such as
-`instance-ai.orchestrator.stream` and `instance-ai.subagent.<role>.stream`.
-Disabling the generic wrapper avoids duplicate same-named agent spans while
-preserving native provider and tool telemetry.
+Required on LLM request spans when tools are present:
 
-## Span Kinds and Inputs
+- tool name,
+- tool description,
+- JSON input schema after redaction and size limiting,
+- provider tool kind when applicable, for example custom tool, server tool, or
+  hosted tool,
+- tool choice and parallel-tool-use settings when available,
+- stable manifest reference or schema hash linking back to the agent activation
+  manifest.
 
-Use LangSmith-compatible span attributes:
+Required on all LLM request spans when recording policy allows it:
 
-- Product orchestration spans: `langsmith.span.kind = chain`
-- Local execution spans: `langsmith.span.kind = tool`
-- Native provider spans: emitted by AI SDK and translated as `llm`
-- Root or named spans: `langsmith.trace.name`
-- Thread grouping: `langsmith.metadata.thread_id`
+- system and conversation messages sent to the provider,
+- model/provider identifiers,
+- tool calls emitted by the model,
+- tool results observed by the following turn,
+- finish reason,
+- raw provider response metadata,
+- token/cache usage.
 
-Product span inputs and outputs must be summaries, not raw large payloads.
+The noisy wrapper spans, for example `ai.streamText`, may be filtered in the
+Instance AI LangSmith exporter only if doing so does not hide messages, tool
+definitions, tool calls, response metadata, or token usage.
 
-Examples:
+## Activation and Waiting Semantics
 
-- Context compaction input: message counts, token estimates, compaction mode.
-- Prompt build output: prompt section names and total estimated tokens.
-- Workspace edit output: file path, operation type, replacements, diff summary.
-- Workflow submission output: workflow ID, node count, validation summary.
-- HITL output: pending tool call ID, tool name, safe user decision summary.
+The orchestrator must not look like it ran for 10 minutes because it waited for
+a background builder.
 
-## Native LLM Telemetry
+Expected flow for complex build:
 
-All normal Instance AI model calls must use native `@n8n/agents` telemetry.
+```text
+Root A: instance-ai.message_turn
+  agent.orchestrator
+    LLM calls plan/build-workflow-with-agent
+    tool call schedules builder task
+  status=suspended_or_waiting_for_background
 
-Required behavior:
+Root B: instance-ai.background_subagent
+  agent.workflow-builder
+    LLM calls
+    workspace/file/submit/verify tools
+  status=completed
 
-- `AgentRuntime` passes `experimental_telemetry` to AI SDK calls.
-- `functionId` is stable and role-specific, for example
-  `instance-ai.orchestrator` or `instance-ai.subagent.planner`.
-- Runtime root spans are created with the same active OTel context as the
-  product span that invoked them.
-- Provider spans show messages, tool schemas, tool choice, finish reason,
-  timing, response metadata, cost, and token usage when recording policy allows
-  it.
-- Local tools executed by `@n8n/agents` emit AI-SDK-compatible `ai.toolCall`
-  spans.
-- The telemetry provider is flushed at turn end, before suspension persistence,
-  and after detached task completion.
+Root C: instance-ai.orchestrator_resume
+  agent.orchestrator
+    consumes <background-task-completed>
+    verifies, checkpoints, or summarizes
+  status=completed
+```
 
-Normal execution must never set
-`experimental_telemetry: { isEnabled: false }` when LangSmith tracing is
-enabled.
+The wait itself lives in task storage and event history, not in an open span.
 
-## Tool Tracing
+Long user waits follow the same rule. A trace can end with
+`status=suspended` and metadata describing the pending tool call. The resumed
+activation starts a new root trace with resume metadata.
 
-Tool tracing has two layers in OTel:
+Browser credential setup follows the detached/resume rule when it opens a
+browser, waits for user action, or can run long. Fast credential discovery and
+validation can remain inline tool work under the orchestrator activation.
 
-1. Model-facing tool calls
+Planned checkpoint follow-ups use `trace_kind=orchestrator_resume` with
+`resume_reason=planned_checkpoint`. A separate root kind is not needed. The
+LangSmith display name may be specialized, for example
+`instance-ai.orchestrator_resume.checkpoint`, as long as the trace kind remains
+`orchestrator_resume`.
 
-   These come from AI SDK telemetry. They describe what the model saw and chose.
+## Thread and Link Metadata
 
-2. n8n-side tool execution
+Every root and child span exported to LangSmith must carry thread metadata so
+filtering, token counting, and cost aggregation include the full thread.
 
-   These are local handler spans. They describe what n8n did.
+Required on every span:
 
-Default local tool execution should use `ai.toolCall` spans with:
+- `thread_id`
+- `conversation_id`
+- `message_group_id`
+- `run_id`
+- `trace_kind`
+- `activation_id`
+- `instance_ai.trace_version`
 
-- `ai.operationId = ai.toolCall`
-- `ai.toolCall.name`
-- `ai.toolCall.id`
-- `ai.toolCall.args`, when input recording is enabled
-- `ai.toolCall.result`, when output recording is enabled
-- `ai.telemetry.metadata.*`
+Required on agent activation spans:
 
-Do not emit duplicate `instance-ai.tool.*` product spans for normal tool
-execution. Add product side-effect spans only for lifecycle events that a normal
-`ai.toolCall` span does not represent, currently HITL suspend/resume.
+- `agent_role`
+- `agent_id`
+- `execution_mode`
+- `model_id`
+- `available_tools`
+- `tool_count`
 
-## Service Proxy Support
+Required on LLM request spans with tools:
 
-Instance AI must support two LangSmith routing modes:
+- `llm.available_tools`: the provider-facing tool definitions sent in this
+  request,
+- `llm.available_tool_names`: compact ordered names for scanning/filtering,
+- `llm.tool_manifest_ref`: reference to the parent agent activation manifest,
+- `llm.tool_schema_hash`: hash of the redacted provider-facing tool definition
+  set.
 
-1. Direct LangSmith credentials.
-2. AI service proxy routing with per-request auth headers.
+Required on detached background roots:
 
-`@n8n/agents` owns generic LangSmith OTLP exporter construction. Instance AI
-owns request-scoped proxy header creation and safe routing decisions.
+- `task_id`
+- `task_kind`
+- `planned_task_id`, when applicable
+- `work_item_id`, when applicable
+- `parent_checkpoint_id`, when applicable
+- `spawned_by_trace_id`
+- `spawned_by_span_id`
+- `spawned_by_activation_id`
+- `spawned_by_agent_role`
+- `spawned_by_tool_call_id`, when available
+- `originating_message_group_id`
 
-Security requirement: if a LangSmith API key is resolved by the engine, user
-controlled endpoint overrides must not be able to redirect that key.
+Required on resume roots:
 
-## Redaction and Recording Policy
+- `resume_reason`
+- `resumed_from_trace_id`, when available
+- `resumed_from_span_id`, when available
+- `resumed_from_activation_id`, when available
+- `pending_tool_call_id`, when applicable
+- `completed_task_id`, when applicable
+- `checkpoint_task_id`, when applicable
 
-Native telemetry can expose system prompts, messages, tool schemas, tool
-arguments, and outputs. This is useful for debugging and risky in production.
+LangSmith-specific copies, for example `langsmith.metadata.thread_id`, are
+created only in the Instance AI LangSmith adapter.
 
-Required policy:
+## Tool Manifest Contract
 
-- Self-hosted deployments remain opt-in for LangSmith tracing.
-- Production recording defaults must be explicitly reviewed.
-- Credentials, API keys, bearer tokens, cookies, decrypted node parameters, and
-  auth headers must never be exported.
-- Workflow JSON and workspace file contents should be summarized by default.
-- Tool schemas may be exported.
-- Tool arguments and tool outputs must pass through redaction before export.
-- Redaction must preserve token usage attributes.
+Every agent activation span should expose a compact assigned-tool manifest.
+This is the primary answer to "which tools were assigned to which agent?"
 
-Recommended defaults:
+Manifest fields:
 
-| Environment | Tracing | Record inputs | Record outputs |
-| --- | --- | --- | --- |
-| Local development | opt-in | true | true |
-| Internal dogfood | enabled by config | true with redaction | true with redaction |
-| Cloud production | controlled rollout | to be decided | to be decided |
-| Self-hosted | opt-in | false by default | false by default |
+- `name`
+- `description`
+- `source`: `domain`, `orchestration`, `mcp`, `local-mcp`, `workspace`,
+  `provider`
+- `category`: `workflow`, `execution`, `credential`, `node`, `data-table`,
+  `workspace`, `research`, `planning`, `browser`, `filesystem`, `other`
+- `input_schema`, redacted and size limited
+- `approval`: whether the tool can suspend
+- `side_effect`: `none`, `read`, `write`, `execute`, `network`, `browser`
 
-## Trace Replay
+The manifest is recorded once per agent activation. LLM provider spans must also
+show the request-specific tool schemas that were actually sent to the provider.
+This makes the LLM run self-describing in LangSmith while keeping the agent
+activation as the stable debugging surface for assigned tools.
 
-Replay must be independent from LangSmith and independent from OTel span IDs.
+The two copies serve different purposes:
 
-Replay records stable Instance AI events:
+- agent activation manifest: compact inventory of tools assigned to the agent;
+- LLM request tools: exact provider-facing definitions available to that model
+  invocation.
 
-- message turn start/end
-- model request summary
-- model response summary
-- tool call request
-- local tool execution result
-- HITL suspend/resume
-- detached task lifecycle
-- workflow submission and validation summary
+The LLM request tool set can be smaller than the agent manifest if the runtime
+uses dynamic tool filtering. It must never be larger without also updating the
+agent activation manifest or recording a clear reason, for example provider
+hosted tools injected after agent construction.
 
-Replay may emit replay-tagged OTel traces for debugging, but replay correctness
-must not require LangSmith to be available.
+Schema redaction and size limiting must happen before exporting either copy.
+The redacted schema hash should be stable across the agent manifest and all LLM
+request spans using the same effective tool definitions.
 
-## Package Responsibilities
+## Token and Cost Policy
 
-`@n8n/agents` owns:
+### Source of Truth
 
-- generic telemetry builder APIs
-- LangSmith OTLP tracer/provider construction
-- LangSmith OTel span filtering
-- mapping telemetry into AI SDK `experimental_telemetry`
-- optional runtime root spans around generate/stream loops
-- AI-SDK-compatible local `ai.toolCall` spans
-- provider flush and shutdown hooks
-- generic telemetry integration hooks
+For LangSmith display, leaf LLM spans are the source of token and cost usage.
+Product spans do not duplicate child token counts.
+
+For internal billing/debugging, provider raw usage is preferred when available.
+For Anthropic, raw billing buckets are:
+
+- `usage.input_tokens`
+- `usage.output_tokens`
+- `usage.cache_creation_input_tokens`
+- `usage.cache_read_input_tokens`
+
+### LangSmith Adapter Correction
+
+The current inflation pattern happens when LangSmith sees an AI SDK Anthropic
+span where `ai.usage.promptTokens` already includes repeated iteration/cache
+accounting, then LangSmith adds Anthropic cache details from provider metadata.
+
+Instance AI should correct this only in its LangSmith export transform:
+
+- for Anthropic spans with raw provider usage, set `ai.usage.promptTokens` and
+  `ai.usage.inputTokens` to raw non-cache `usage.input_tokens`;
+- set `ai.usage.completionTokens` and `ai.usage.outputTokens` to raw
+  `usage.output_tokens`;
+- preserve `ai.response.providerMetadata` so LangSmith can derive cache details
+  once;
+- do not set product-span usage fields;
+- do not change `@n8n/agents` usage normalization for generic consumers.
+
+If raw provider metadata is missing, the adapter should not guess. It should
+leave AI SDK usage intact and mark the span with
+`instance_ai.usage_source=ai_sdk`.
+
+## Package Boundaries
+
+`@n8n/agents` owns generic primitives only:
+
+- accepting a built telemetry provider/tracer,
+- passing telemetry to AI SDK,
+- native `ai.toolCall` spans for local tool execution,
+- provider flush/shutdown hooks,
+- optional generic runtime spans for non-Instance-AI consumers,
+- generic model/tool metadata that is not LangSmith or Instance AI specific.
+
+`@n8n/agents` must not own:
+
+- Instance AI trace kinds,
+- LangSmith thread metadata,
+- Anthropic billing corrections for LangSmith,
+- Instance AI agent role naming,
+- background task linking,
+- Instance AI redaction policy.
 
 `@n8n/instance-ai` owns:
 
-- product OTel trace context creation
-- thread metadata construction
-- product span helpers for message turns, context compaction, prompt building,
-  HITL, background tasks, and workflow build loops
-- feedback snapshot persistence
-- service proxy request metadata and headers
-- detached sub-agent linking metadata
-- disabling generic runtime root spans when product actor spans are already
-  present
-- trace replay events
+- activation/root trace creation,
+- OTel context propagation across orchestrator, sub-agents, tools, and
+  resumptions,
+- LangSmith exporter configuration and transform,
+- thread metadata and root naming,
+- detached task linking,
+- tool manifest construction,
+- recording/redaction policy,
+- feedback/snapshot IDs,
+- trace replay integration.
 
-`@n8n/ai-utilities` may own:
+`@n8n/ai-utilities` may own shared helpers only if they are not LangSmith
+specific:
 
-- shared redaction helpers
-- JSON-safe telemetry value conversion
-- safe payload summary helpers
-- LangSmith-independent metadata utilities
+- safe JSON serialization,
+- schema summarization,
+- redaction primitives,
+- payload size limiting,
+- generic tool manifest helpers.
 
-## Migration Plan
+## Refactor Plan
 
-1. Document and freeze the current hybrid behavior
+### 1. Split tracing into explicit modules
 
-   - [x] Keep examples of a working hybrid trace with native LLM spans.
-   - [x] Keep examples of the failure mode when OTel spans are forced under
-     RunTree parent IDs.
-   - [x] Add a short note in tests or fixtures explaining why RunTree/OTel
-     parent mixing is forbidden.
+Replace the current single large tracing module with focused pieces:
 
-2. Add an OTel product tracing adapter
+- `tracing/trace-context.ts`
+  - Instance AI trace context types.
+  - Activation context AsyncLocalStorage.
+  - No LangSmith imports.
 
-   - [x] Create an Instance AI adapter that starts active OTel spans using the
-     same tracer/provider as native agent telemetry.
-   - [x] Support `withSpan`, `startSpan`, `finishSpan`, `failSpan`, and
-     metadata merging.
-   - [x] Ensure active context propagates into `@n8n/agents` runtime calls.
-   - [x] Ensure spans flush before response close, suspension persistence, and
-     detached task completion.
+- `tracing/product-spans.ts`
+  - Start/finish/fail product OTel spans.
+  - Context propagation helpers.
+  - Snapshot ID derivation.
 
-3. Replace RunTree message turn roots
+- `tracing/tool-manifest.ts`
+  - Tool assignment summarization and schema redaction.
 
-   - [x] Create `instance-ai.message_turn` as an OTel root span.
-   - [x] Persist OTel trace/span IDs in the agent tree snapshot.
-   - [x] Add metadata required by LangSmith thread view.
-   - [x] Remove RunTree creation from the normal foreground path.
+- `tracing/langsmith-adapter.ts`
+  - LangSmith telemetry/exporter construction.
+  - LangSmith attribute mapping.
+  - Anthropic usage normalization for LangSmith.
+  - Wrapper span filtering.
 
-4. Replace RunTree product child spans
+- `tracing/tool-replay.ts`
+  - Trace replay tool recording and replay hooks.
+  - No LangSmith dependencies.
 
-   - [x] Convert `orchestrator`, `context_compaction`, and `prompt_build` to
-     OTel spans.
-   - [x] Convert inline `subagent:*` spans to OTel spans under active context.
-   - [x] Convert HITL suspend/resume spans to OTel spans.
-   - [x] ~~Convert selected side-effect-heavy tools to OTel product spans.~~
-     Replaced by native `ai.toolCall` spans only; duplicate `instance-ai.tool.*`
-     spans are intentionally not emitted.
+### 2. Remove live RunTree tracing
 
-5. Preserve detached/background sub-agent linking
+Delete normal-path `RunTree` usage from Instance AI:
 
-   - [x] Create detached sub-agent roots as separate OTel traces when they run
-     outside the foreground context.
-   - [x] Add spawning metadata: trace ID, span ID, tool call ID, task ID, and
-     agent role.
-   - [x] Confirm thread queries show detached roots alongside foreground turns.
-     Live validation in `instance-ai-tracing-validation` for thread
-     `otel-validation-f97e5f00-589a-49fb-a536-d54d417c30eb` returned 9 runs
-     across 2 traces, including `instance-ai.message_turn` and
-     `instance-ai.subagent.workflow-builder`.
+- no `RunTree` imports in live tracing/runtime files,
+- no `withRunTree` compatibility API,
+- no manual LLM-step RunTree reconstruction,
+- no synthetic LangSmith tool runs,
+- no RunTree parent overrides.
 
-6. Rework feedback anchoring
+Feedback should use OTel-derived LangSmith run IDs or metadata lookup.
 
-   - [x] Choose explicit LangSmith IDs, derived OTel IDs, or metadata lookup.
-   - [x] Prove `Client.createFeedback` works against an OTel-only product root.
-   - [x] Persist the chosen IDs in the snapshot.
-   - [x] Remove RunTree as a feedback dependency.
+### 3. Rework root creation around activations
 
-7. Remove RunTree live tracing
+Foreground:
 
-   - [x] Remove normal-path `RunTree` root creation.
-   - [x] Remove normal-path manual RunTree tool wrappers.
-   - [x] Keep only temporary compatibility code behind an explicit flag, if
-     needed for rollout.
-     The flag is `N8N_INSTANCE_AI_LEGACY_RUNTREE_TRACING=true`; it is disabled
-     by default.
-   - [x] ~~Delete compatibility code after validation.~~ Deferred to
-     post-rollout cleanup after legacy manual stream-hook consumers are proven
-     unused.
+- create `message_turn` root at the start of the user-message activation;
+- create `agent.orchestrator` as a child;
+- end the root before returning from the activation, including when waiting for
+  background work.
 
-8. Decouple replay from tracing
+Resume:
 
-   - [x] Ensure replay records stable Instance AI events, not span IDs.
-   - [x] Ensure replay tests pass with LangSmith disabled.
-   - [x] ~~Optionally emit replay-tagged OTel spans for debugging only.~~ Not
-     implemented; replay remains LangSmith-independent and does not emit debug
-     traces by default.
+- create `orchestrator_resume` root for approvals, checkpoint follow-ups,
+  background completions, and replans;
+- link to the cause via metadata.
+- for checkpoint follow-ups, set `resume_reason=planned_checkpoint` instead of
+  introducing a dedicated trace kind.
 
-9. Add regression coverage
+Background:
 
-   - [x] Unit test metadata construction.
-   - [x] Unit test OTel product span parentage.
-   - [x] Unit test feedback ID persistence.
-   - [x] Unit test redaction preserving token usage.
-   - [x] Local exporter test proving one foreground message turn contains
-     product spans, native provider spans, and local tool spans.
-   - [x] Live LangSmith validation behind explicit credentials.
-     The validation showed native `ai.generateText.doGenerate` spans under the
-     foreground product trace with system/user messages, tool definitions,
-     tool choice, token usage, and a local tool span.
+- do not create a background root before `spawnBackgroundTask()` has accepted
+  the task;
+- create the root inside the background task's execution function when the task
+  starts running;
+- update the managed task/snapshot with trace IDs after root creation;
+- never create phantom roots for duplicate or limit-reached spawn attempts.
+- use the same background root model for long browser credential setup flows.
+
+### 4. Make agent activation wrapping consistent
+
+All model operations should run under an explicit Instance AI agent activation
+span:
+
+- orchestrator foreground/resume/checkpoint,
+- planner,
+- inline delegate,
+- quick inline browser credential checks,
+- detached builder,
+- detached data-table manager,
+- detached researcher,
+- detached delegate,
+- detached browser credential setup flows that open a browser or wait for user
+  action.
+
+Context compaction should be a root-level child under the current
+`message_turn` or `orchestrator_resume` root, before `agent.orchestrator`.
+Compaction prepares the orchestrator input; it is not part of the orchestrator
+agent activation duration.
+
+Title generation should use an internal OTel span. It is exported to LangSmith
+only when `include_internal=true` or when title generation fails.
+
+### 5. Rely on native AI SDK LLM/tool spans
+
+Remove manual LLM step hooks from `resumable-stream-executor` once native spans
+cover the same information.
+
+The stream consumer should still produce Instance AI SSE events and work
+summaries, but it should not be responsible for reconstructing LangSmith LLM
+runs.
+
+### 6. Preserve HITL visibility without long spans
+
+HITL suspensions and resumptions need product side-effect spans:
+
+- `instance-ai.hitl.suspend`
+- `instance-ai.hitl.resume`
+
+They should include:
+
+- pending tool call ID,
+- tool name,
+- approval/input kind,
+- request ID,
+- sanitized decision summary.
+
+The suspended activation root ends after the suspension is persisted. The
+resume activation is a new root linked by metadata.
+
+### 7. Normalize LangSmith usage in Instance AI only
+
+Implement the Anthropic token correction in `tracing/langsmith-adapter.ts`.
+Regression tests should validate that a span with raw Anthropic usage exports
+non-cache input as `promptTokens/inputTokens` and lets cache details remain
+provider-derived.
+
+### 8. Redaction and payload policy
+
+Keep detailed traces useful locally and safe by default:
+
+- credentials, bearer tokens, cookies, API keys, decrypted node parameters, and
+  auth headers are always redacted;
+- workflow JSON, execution data, and workspace file contents are summarized by
+  default;
+- tool schemas are allowed after size limiting;
+- tool inputs/outputs are recorded only according to environment policy;
+- token usage and provider metadata needed for billing must not be removed by
+  redaction.
+
+### 9. Validate against real LangSmith threads
+
+Before committing the implementation, validate with live LangSmith traces:
+
+- one simple foreground message,
+- one inline planner run with plan approval,
+- one detached workflow builder with orchestrator handoff,
+- one checkpoint follow-up,
+- one HITL suspend/resume,
+- one browser credential setup flow if browser tools are enabled.
+
+Each validation run must inspect at least one LLM span directly and confirm
+LangSmith shows the available tool definitions on that LLM run, not only on the
+parent agent activation span.
 
 ## Acceptance Criteria
 
-- Foreground message turns are visible as OTel root spans named
-  `instance-ai.message_turn`.
-- Orchestrator LLM calls are children of the foreground message turn in the OTel
-  tree.
-- Inline planner work is inside the foreground OTel tree.
-- Detached workflow-builder work is queryable by the same thread ID and linked
-  by spawning metadata.
-- LangSmith shows native provider spans without the noisy `ai.streamText`
-  wrapper span.
-- LangSmith shows system/user/assistant/tool messages, available tools, tool
-  choice, timing, cost, and token usage when recording policy allows it.
-- Product root spans and native spans include `langsmith.metadata.thread_id`.
-- Product root spans can receive feedback without RunTree.
-- HITL suspend/resume remains visible and resumable across process boundaries.
+- `packages/@n8n/instance-ai` live tracing has no `RunTree` dependency.
+- `@n8n/agents` contains no Instance AI-specific LangSmith mapping or Anthropic
+  billing workaround.
+- A simple user message creates one `message_turn` root trace.
+- Inline planner/delegate/browser agents appear as child agent activation spans,
+  not separate thread steps.
+- Detached builder/data-table/research/delegate tasks appear as
+  `background_subagent` root traces with clear linking metadata.
+- The orchestrator activation duration excludes background wait time.
+- Background roots are created only for tasks that actually start.
+- Orchestrator resume/checkpoint work appears as `orchestrator_resume` roots.
+- Each agent activation shows the assigned tool manifest.
+- Native LLM spans show messages, request-specific tool definitions, tool
+  choice, tool calls, finish reason, and provider usage when recording policy
+  allows it.
+- Tool definitions on LLM spans include name, description, redacted JSON input
+  schema, provider tool kind, and a stable manifest/schema hash.
+- LangSmith renders available tools on the LLM node for orchestrator, planner,
+  and workflow-builder model calls.
+- LangSmith token totals for Anthropic threads are in line with Anthropic
+  billing buckets: non-cache input, cache creation, cache read, and output are
+  not double counted.
+- Product spans do not duplicate child LLM token usage.
 - Trace replay works with LangSmith disabled.
-- No normal execution path creates RunTree spans after the migration flag is
-  removed.
+- Feedback can be attached to OTel-only product roots.
 
-## Open Questions
+## Test Plan
 
-1. Which feedback anchoring strategy should we use: explicit LangSmith span IDs,
-   derived OTel IDs, or metadata lookup?
-2. Should inline sub-agent root spans be named as independent
-   `instance-ai.subagent.<role>.stream` spans, or should they stay nested under
-   the parent `ai.toolCall` span only?
-3. What is the production default for recording prompt messages and tool
-   arguments after redaction?
-4. How long do we keep a dual-emission migration flag, if at all?
-5. Should the product OTel span helper live in Instance AI only, or become a
-   generic helper in `@n8n/agents`?
+Unit coverage:
 
-## Implementation Notes
+- trace kind/root metadata construction;
+- OTel parentage for foreground and inline sub-agent spans;
+- detached task root creation only after accepted task start;
+- resume root metadata for approval, background completion, checkpoint, and
+  replan causes;
+- tool manifest generation and schema redaction;
+- LLM request tool metadata generation from the provider-facing tool set;
+- LangSmith adapter mapping for LLM request tools so definitions render on the
+  LLM run, not only as opaque metadata;
+- LangSmith adapter Anthropic usage normalization;
+- redaction preserves token/provider usage fields;
+- trace replay does not import or require LangSmith.
 
-- Prefer OTel active context propagation over explicit parent ID attributes.
-- Do not cross-parent OTel spans to RunTree run IDs.
-- Do not force OTel spans into an existing RunTree trace ID.
-- Keep span names stable and searchable.
-- Keep telemetry flush best-effort and non-blocking for user responses where
-  possible.
-- Keep product span payloads small and redacted.
-- Use tags for coarse filtering: `instance-ai`, `foreground`, `background`,
-  `hitl`, `detached-subagent`, `replay`.
+Integration coverage:
+
+- local OTel exporter test for a foreground orchestrator run;
+- local OTel exporter test for inline planner;
+- local OTel exporter test for detached builder root and handoff resume;
+- stream/HITL test proving spans close before wait and resume starts a new
+  root;
+- background task duplicate/limit test proving no phantom LangSmith roots.
+
+Live validation:
+
+- run a real Anthropic thread and compare LangSmith token/cost display against
+  Anthropic usage buckets;
+- verify LangSmith thread view contains roots with `trace_kind` values that
+  distinguish user turns from background and resume activations;
+- verify agent activation spans expose tool manifests;
+- verify LLM spans expose the exact available tool definitions for at least the
+  orchestrator, planner, and workflow-builder.
+
+## Settled Design Decisions
+
+- Title generation is always instrumented as an internal OTel operation, but it
+  is exported to LangSmith only when `include_internal=true` or when the title
+  operation fails.
+- Context compaction is a root-level preparation span under the current
+  `message_turn` or `orchestrator_resume` root. It runs before
+  `agent.orchestrator` and is not counted as orchestrator activation time.
+- Browser credential setup stays inline only for quick credential checks.
+  Browser flows that open a browser, wait for user action, or can run long use
+  the detached `background_subagent` plus `orchestrator_resume` model.
+- Planned checkpoint follow-ups use `trace_kind=orchestrator_resume` with
+  `resume_reason=planned_checkpoint`. We do not add a dedicated
+  `planned_checkpoint` root kind.