Document that title-generation internal_operation roots are intentionally gated by Instance AI's internal tracing env flags to avoid default LangSmith thread noise.
26 KiB
Instance AI Tracing Specs
Status: planning clean Instance AI tracing rewrite
Last updated: 2026-05-06
Decision
Instance AI tracing will be rebuilt around a clean, activation-scoped
OpenTelemetry model owned by @n8n/instance-ai.
LangSmith is an export target for Instance AI traces, not the internal tracing
model. LangSmith-specific attribute names, token workarounds, thread grouping
policy, and trace-shaping rules must stay in packages/@n8n/instance-ai.
packages/@n8n/agents must remain provider/platform neutral, except for
generic telemetry primitives that any product can use.
The target live trace hierarchy has no LangSmith RunTree spans. Native AI SDK
model/tool telemetry and Instance AI product spans must share the same OTel
context tree.
Design Principles
-
Activation duration, not logical task lifetime
A span measures active work. It must not stay open while the orchestrator is waiting for user approval, a background task, or a future scheduler tick.
-
One thread, multiple root operations
A LangSmith thread is a chronological group of root traces linked by
thread_id. A complex Instance AI request can therefore contain:- a user message activation,
- one or more detached background task activations,
- one or more orchestrator resume/checkpoint activations.
-
Inline sub-agents are children, detached sub-agents are roots
A planner or delegate that runs synchronously inside the current orchestrator activation is a child span. A builder, data-table, research, or delegate task that runs after the orchestrator returns is a separate root trace linked back with spawning metadata.
-
Leaf LLM spans own token usage
Product spans must not carry prompt/completion usage that duplicates child LLM spans. Token and cost rollups should come from native AI SDK provider spans, with Instance AI applying LangSmith export fixes only in the Instance AI LangSmith adapter.
-
Tool definitions are both agent and LLM request metadata
We need to know which tools were assigned to which agent. That belongs on the agent activation span as a compact tool manifest. We also need to know which exact tools were available to each model call. That belongs on every LLM span that sends tools to the provider, using the provider-facing tool definitions from that request. Individual tool executions remain native
ai.toolCallspans. -
Trace replay is separate from observability
Replay records deterministic Instance AI events and tool I/O. It must not depend on LangSmith IDs, OTel span IDs, or the shape of the LangSmith UI.
-
Do not hide provider semantics
The LangSmith trace should show native LLM inputs, messages, available tools, tool calls, provider metadata, finish reasons, token usage, and cache usage whenever recording policy allows it.
Non-Goals
- Preserve the Mastra-era trace tree.
- Preserve the current hybrid
RunTreeplus OTel implementation. - Keep a compatibility layer that manually reconstructs LLM steps in LangSmith.
- Make
@n8n/agentsaware of Instance AI thread IDs, agent roles, task IDs, or LangSmith-specific token correction. - Use LangSmith trace shape as product state.
- Keep orchestration spans open during background waits.
Current Instance AI Execution Inventory
The tracing rewrite must cover every current model operation and non-model execution path below.
| Area | Current code path | Model operation | Target trace shape |
|---|---|---|---|
| Foreground orchestrator | createInstanceAgent() -> streamAgentRun() |
Agent.stream() |
Child agent activation under message_turn or orchestrator_resume root |
| Context compaction | generateCompactionSummary() |
Agent.generate() with no tools |
Root-level child under message_turn or orchestrator_resume, before agent.orchestrator; internal root only when run out of band |
| Thread title | generateTitleForRun() |
generateTitleFromMessage() |
Internal OTel operation behind the internal-tracing gate; when enabled, native LLM telemetry attaches to the internal root |
| Inline planner | plan tool / createPlanWithAgentTool() |
Agent.stream() with planner tools |
Child agent.planner activation under the current orchestrator activation |
| Inline delegate | delegate tool / createDelegateTool() |
createSubAgent().stream() |
Child agent.<role> activation under the current orchestrator activation |
| Browser credential setup | browser-credential-setup tool |
Agent.stream() plus resume/nudge loops |
Quick credential checks stay inline; browser/user-wait flows use detached background_subagent plus orchestrator_resume |
| Detached builder | build-workflow-with-agent / planned build task |
Agent.stream() in sandbox or tool mode |
Separate background_subagent root linked to the spawn tool call |
| Detached data-table agent | manage-data-tables-with-agent / planned task |
Agent.stream() |
Separate background_subagent root |
| Detached research agent | research-with-agent / planned task |
Agent.stream() |
Separate background_subagent root |
| Detached custom delegate | planned delegate task | createSubAgent().stream() |
Separate background_subagent root |
| Planned checkpoint | service-created follow-up turn | Orchestrator Agent.stream() |
Separate orchestrator_resume root with resume_reason=planned_checkpoint |
| Background completion handoff | service-created follow-up turn | Orchestrator Agent.stream() |
Separate orchestrator_resume root with resume_reason=background_task_completed |
| Workflow loop | WorkflowLoopRuntime, verify-built-workflow, report-verification-verdict |
Mostly deterministic tools | Tool spans and product state spans only, no LLM span unless orchestrator chooses one |
| Builder memory compaction | compactBuilderMemoryThread() |
Currently deterministic storage compaction | Product span only if useful; if it later calls an LLM, trace as an internal child of builder root |
Target Trace Model
flowchart TD
Thread[LangSmith thread_id]
Thread --> A[Root: instance-ai.message_turn]
A --> A2[context_compaction]
A --> A3[prompt_build]
A --> A1[agent.orchestrator]
A1 --> A4[LLM provider span]
A1 --> A5[tool: plan]
A5 --> A6[agent.planner]
A6 --> A7[LLM provider spans]
A6 --> A8[tool: submit-plan]
A1 --> A9[background spawn metadata]
Thread --> B[Root: instance-ai.background_subagent]
B --> B1[agent.workflow-builder]
B1 --> B2[LLM provider spans]
B1 --> B3[workspace and workflow tools]
Thread --> C[Root: instance-ai.orchestrator_resume]
C --> C1[agent.orchestrator]
C1 --> C2[tool: verify-built-workflow]
C1 --> C3[tool: complete-checkpoint]
Root Trace Kinds
message_turn
- Triggered by a user chat message.
- Contains active foreground orchestrator work for that message.
- Ends as
completed,failed,cancelled, orsuspended. - If it schedules background tasks and returns, it ends immediately after the scheduling result is persisted and emitted.
orchestrator_resume
- Triggered by a tool approval, plan approval, background-task completion, planned checkpoint, replan, or correction handoff.
- Contains only active continuation work.
- Does not inherit the duration of the suspended or background operation that caused it.
background_subagent
- Triggered when a background task actually starts executing, not when the orchestrator merely requests it.
- Used by workflow builder, data-table manager, researcher, and detached delegate workers.
- Linked to the spawning activation by metadata, not by OTel parentage.
internal_operation
- Used for optional internal LLM calls that are not part of a user-visible agent activation, such as title generation.
- Intentionally gated from normal LangSmith export unless internal tracing is enabled. This keeps background utility calls from adding default thread steps.
Agent Activation Spans
Agent activation spans describe the Instance AI actor that is running. They are product spans, not model spans.
Recommended names:
instance-ai.agent.orchestratorinstance-ai.agent.plannerinstance-ai.agent.workflow-builderinstance-ai.agent.data-table-managerinstance-ai.agent.web-researcherinstance-ai.agent.delegateinstance-ai.agent.credential-setup-browser-agent
Each agent activation span must include:
- agent role and agent id,
- model id,
- execution mode,
- max iteration budget,
- assigned tool manifest,
- memory/checkpoint summary,
- prompt section summary.
The full system prompt can be recorded only when recording policy allows it. The compact tool manifest should always be safe to record after schema redaction.
Native AI SDK Spans
Native model and tool spans should be kept as close as possible to what
@n8n/agents and AI SDK produce:
ai.streamText.doStreamai.generateText.doGenerateai.toolCall
Every LLM request span must include the exact tool definitions sent to the model for that request when tools are present. This must not rely on the parent agent activation span alone, because LangSmith renders available tool specs from the LLM run itself.
Required on LLM request spans when tools are present:
- tool name,
- tool description,
- JSON input schema after redaction and size limiting,
- provider tool kind when applicable, for example custom tool, server tool, or hosted tool,
- tool choice and parallel-tool-use settings when available,
- stable manifest reference or schema hash linking back to the agent activation manifest.
Required on all LLM request spans when recording policy allows it:
- system and conversation messages sent to the provider,
- model/provider identifiers,
- tool calls emitted by the model,
- tool results observed by the following turn,
- finish reason,
- raw provider response metadata,
- token/cache usage.
The noisy wrapper spans, for example ai.streamText, may be filtered in the
Instance AI LangSmith exporter only if doing so does not hide messages, tool
definitions, tool calls, response metadata, or token usage.
Activation and Waiting Semantics
The orchestrator must not look like it ran for 10 minutes because it waited for a background builder.
Expected flow for complex build:
Root A: instance-ai.message_turn
agent.orchestrator
LLM calls plan/build-workflow-with-agent
tool call schedules builder task
status=suspended_or_waiting_for_background
Root B: instance-ai.background_subagent
agent.workflow-builder
LLM calls
workspace/file/submit/verify tools
status=completed
Root C: instance-ai.orchestrator_resume
agent.orchestrator
consumes <background-task-completed>
verifies, checkpoints, or summarizes
status=completed
The wait itself lives in task storage and event history, not in an open span.
Long user waits follow the same rule. A trace can end with
status=suspended and metadata describing the pending tool call. The resumed
activation starts a new root trace with resume metadata.
Browser credential setup follows the detached/resume rule when it opens a browser, waits for user action, or can run long. Fast credential discovery and validation can remain inline tool work under the orchestrator activation.
Planned checkpoint follow-ups use trace_kind=orchestrator_resume with
resume_reason=planned_checkpoint. A separate root kind is not needed. The
LangSmith display name may be specialized, for example
instance-ai.orchestrator_resume.checkpoint, as long as the trace kind remains
orchestrator_resume.
Thread and Link Metadata
Every root and child span exported to LangSmith must carry thread metadata so filtering, token counting, and cost aggregation include the full thread.
Required on every span:
thread_idconversation_idmessage_group_idrun_idtrace_kindactivation_idinstance_ai.trace_version
Required on agent activation spans:
agent_roleagent_idexecution_modemodel_idavailable_toolstool_count
Required on LLM request spans with tools:
llm.available_tools: the provider-facing tool definitions sent in this request,llm.available_tool_names: compact ordered names for scanning/filtering,llm.tool_manifest_ref: reference to the parent agent activation manifest,llm.tool_schema_hash: hash of the redacted provider-facing tool definition set.
Required on detached background roots:
task_idtask_kindplanned_task_id, when applicablework_item_id, when applicableparent_checkpoint_id, when applicablespawned_by_trace_idspawned_by_span_idspawned_by_activation_idspawned_by_agent_rolespawned_by_tool_call_id, when availableoriginating_message_group_id
Required on resume roots:
resume_reasonresumed_from_trace_id, when availableresumed_from_span_id, when availableresumed_from_activation_id, when availablepending_tool_call_id, when applicablecompleted_task_id, when applicablecheckpoint_task_id, when applicable
LangSmith-specific copies, for example langsmith.metadata.thread_id, are
created only in the Instance AI LangSmith adapter.
Tool Manifest Contract
Every agent activation span should expose a compact assigned-tool manifest. This is the primary answer to "which tools were assigned to which agent?"
Manifest fields:
namedescriptionsource:domain,orchestration,mcp,local-mcp,workspace,providercategory:workflow,execution,credential,node,data-table,workspace,research,planning,browser,filesystem,otherinput_schema, redacted and size limitedapproval: whether the tool can suspendside_effect:none,read,write,execute,network,browser
The manifest is recorded once per agent activation. LLM provider spans must also show the request-specific tool schemas that were actually sent to the provider. This makes the LLM run self-describing in LangSmith while keeping the agent activation as the stable debugging surface for assigned tools.
The two copies serve different purposes:
- agent activation manifest: compact inventory of tools assigned to the agent;
- LLM request tools: exact provider-facing definitions available to that model invocation.
The LLM request tool set can be smaller than the agent manifest if the runtime uses dynamic tool filtering. It must never be larger without also updating the agent activation manifest or recording a clear reason, for example provider hosted tools injected after agent construction.
Schema redaction and size limiting must happen before exporting either copy. The redacted schema hash should be stable across the agent manifest and all LLM request spans using the same effective tool definitions.
Token and Cost Policy
Source of Truth
For LangSmith display, leaf LLM spans are the source of token and cost usage. Product spans do not duplicate child token counts.
For internal billing/debugging, provider raw usage is preferred when available. For Anthropic, raw billing buckets are:
usage.input_tokensusage.output_tokensusage.cache_creation_input_tokensusage.cache_read_input_tokens
LangSmith Adapter Correction
The current inflation pattern happens when LangSmith sees an AI SDK Anthropic
span where ai.usage.promptTokens already includes repeated iteration/cache
accounting, then LangSmith adds Anthropic cache details from provider metadata.
Instance AI should correct this only in its LangSmith export transform:
- for Anthropic spans with raw provider usage, set
ai.usage.promptTokensandai.usage.inputTokensto raw non-cacheusage.input_tokens; - set
ai.usage.completionTokensandai.usage.outputTokensto rawusage.output_tokens; - preserve
ai.response.providerMetadataso LangSmith can derive cache details once; - do not set product-span usage fields;
- do not change
@n8n/agentsusage normalization for generic consumers.
If raw provider metadata is missing, the adapter should not guess. It should
leave AI SDK usage intact and mark the span with
instance_ai.usage_source=ai_sdk.
Package Boundaries
@n8n/agents owns generic primitives only:
- accepting a built telemetry provider/tracer,
- passing telemetry to AI SDK,
- native
ai.toolCallspans for local tool execution, - provider flush/shutdown hooks,
- optional generic runtime spans for non-Instance-AI consumers,
- generic model/tool metadata that is not LangSmith or Instance AI specific.
@n8n/agents must not own:
- Instance AI trace kinds,
- LangSmith thread metadata,
- Anthropic billing corrections for LangSmith,
- Instance AI agent role naming,
- background task linking,
- Instance AI redaction policy.
@n8n/instance-ai owns:
- activation/root trace creation,
- OTel context propagation across orchestrator, sub-agents, tools, and resumptions,
- LangSmith exporter configuration and transform,
- thread metadata and root naming,
- detached task linking,
- tool manifest construction,
- recording/redaction policy,
- feedback/snapshot IDs,
- trace replay integration.
@n8n/ai-utilities may own shared helpers only if they are not LangSmith
specific:
- safe JSON serialization,
- schema summarization,
- redaction primitives,
- payload size limiting,
- generic tool manifest helpers.
Refactor Plan
1. Split tracing into explicit modules
Replace the current single large tracing module with focused pieces:
-
tracing/trace-context.ts- Instance AI trace context types.
- Activation context AsyncLocalStorage.
- No LangSmith imports.
-
tracing/product-spans.ts- Start/finish/fail product OTel spans.
- Context propagation helpers.
- Snapshot ID derivation.
-
tracing/tool-manifest.ts- Tool assignment summarization and schema redaction.
-
tracing/langsmith-adapter.ts- LangSmith telemetry/exporter construction.
- LangSmith attribute mapping.
- Anthropic usage normalization for LangSmith.
- Wrapper span filtering.
-
tracing/tool-replay.ts- Trace replay tool recording and replay hooks.
- No LangSmith dependencies.
2. Remove live RunTree tracing
Delete normal-path RunTree usage from Instance AI:
- no
RunTreeimports in live tracing/runtime files, - no
withRunTreecompatibility API, - no manual LLM-step RunTree reconstruction,
- no synthetic LangSmith tool runs,
- no RunTree parent overrides.
Feedback should use OTel-derived LangSmith run IDs or metadata lookup.
3. Rework root creation around activations
Foreground:
- create
message_turnroot at the start of the user-message activation; - create
agent.orchestratoras a child; - end the root before returning from the activation, including when waiting for background work.
Resume:
- create
orchestrator_resumeroot for approvals, checkpoint follow-ups, background completions, and replans; - link to the cause via metadata.
- for checkpoint follow-ups, set
resume_reason=planned_checkpointinstead of introducing a dedicated trace kind.
Background:
- do not create a background root before
spawnBackgroundTask()has accepted the task; - create the root inside the background task's execution function when the task starts running;
- update the managed task/snapshot with trace IDs after root creation;
- never create phantom roots for duplicate or limit-reached spawn attempts.
- use the same background root model for long browser credential setup flows.
4. Make agent activation wrapping consistent
All model operations should run under an explicit Instance AI agent activation span:
- orchestrator foreground/resume/checkpoint,
- planner,
- inline delegate,
- quick inline browser credential checks,
- detached builder,
- detached data-table manager,
- detached researcher,
- detached delegate,
- detached browser credential setup flows that open a browser or wait for user action.
Context compaction should be a root-level child under the current
message_turn or orchestrator_resume root, before agent.orchestrator.
Compaction prepares the orchestrator input; it is not part of the orchestrator
agent activation duration.
Title generation should use an internal OTel span when internal tracing is
enabled. In the default path it is intentionally gated rather than accidentally
exported as an orphan LLM trace. Instance AI currently treats
N8N_INSTANCE_AI_TRACE_INTERNAL=true or
N8N_INSTANCE_AI_TRACE_INCLUDE_INTERNAL=true as the include_internal gate.
5. Rely on native AI SDK LLM/tool spans
Remove manual LLM step hooks from resumable-stream-executor once native spans
cover the same information.
The stream consumer should still produce Instance AI SSE events and work summaries, but it should not be responsible for reconstructing LangSmith LLM runs.
6. Preserve HITL visibility without long spans
HITL suspensions and resumptions need product side-effect spans:
instance-ai.hitl.suspendinstance-ai.hitl.resume
They should include:
- pending tool call ID,
- tool name,
- approval/input kind,
- request ID,
- sanitized decision summary.
The suspended activation root ends after the suspension is persisted. The resume activation is a new root linked by metadata.
7. Normalize LangSmith usage in Instance AI only
Implement the Anthropic token correction in tracing/langsmith-adapter.ts.
Regression tests should validate that a span with raw Anthropic usage exports
non-cache input as promptTokens/inputTokens and lets cache details remain
provider-derived.
8. Redaction and payload policy
Keep detailed traces useful locally and safe by default:
- credentials, bearer tokens, cookies, API keys, decrypted node parameters, and auth headers are always redacted;
- workflow JSON, execution data, and workspace file contents are summarized by default;
- tool schemas are allowed after size limiting;
- tool inputs/outputs are recorded only according to environment policy;
- token usage and provider metadata needed for billing must not be removed by redaction.
9. Validate against real LangSmith threads
Before committing the implementation, validate with live LangSmith traces:
- one simple foreground message,
- one inline planner run with plan approval,
- one detached workflow builder with orchestrator handoff,
- one checkpoint follow-up,
- one HITL suspend/resume,
- one browser credential setup flow if browser tools are enabled.
Each validation run must inspect at least one LLM span directly and confirm LangSmith shows the available tool definitions on that LLM run, not only on the parent agent activation span.
Acceptance Criteria
packages/@n8n/instance-ailive tracing has noRunTreedependency.@n8n/agentscontains no Instance AI-specific LangSmith mapping or Anthropic billing workaround.- A simple user message creates one
message_turnroot trace. - Inline planner/delegate/browser agents appear as child agent activation spans, not separate thread steps.
- Detached builder/data-table/research/delegate tasks appear as
background_subagentroot traces with clear linking metadata. - The orchestrator activation duration excludes background wait time.
- Background roots are created only for tasks that actually start.
- Orchestrator resume/checkpoint work appears as
orchestrator_resumeroots. - Each agent activation shows the assigned tool manifest.
- Native LLM spans show messages, request-specific tool definitions, tool choice, tool calls, finish reason, and provider usage when recording policy allows it.
- Tool definitions on LLM spans include name, description, redacted JSON input schema, provider tool kind, and a stable manifest/schema hash.
- LangSmith renders available tools on the LLM node for orchestrator, planner, and workflow-builder model calls.
- LangSmith token totals for Anthropic threads are in line with Anthropic billing buckets: non-cache input, cache creation, cache read, and output are not double counted.
- Product spans do not duplicate child LLM token usage.
- Trace replay works with LangSmith disabled.
- Feedback can be attached to OTel-only product roots.
Test Plan
Unit coverage:
- trace kind/root metadata construction;
- OTel parentage for foreground and inline sub-agent spans;
- detached task root creation only after accepted task start;
- resume root metadata for approval, background completion, checkpoint, and replan causes;
- tool manifest generation and schema redaction;
- LLM request tool metadata generation from the provider-facing tool set;
- LangSmith adapter mapping for LLM request tools so definitions render on the LLM run, not only as opaque metadata;
- LangSmith adapter Anthropic usage normalization;
- redaction preserves token/provider usage fields;
- trace replay does not import or require LangSmith.
Integration coverage:
- local OTel exporter test for a foreground orchestrator run;
- local OTel exporter test for inline planner;
- local OTel exporter test for detached builder root and handoff resume;
- stream/HITL test proving spans close before wait and resume starts a new root;
- background task duplicate/limit test proving no phantom LangSmith roots.
Live validation:
- run a real Anthropic thread and compare LangSmith token/cost display against Anthropic usage buckets;
- verify LangSmith thread view contains roots with
trace_kindvalues that distinguish user turns from background and resume activations; - verify agent activation spans expose tool manifests;
- verify LLM spans expose the exact available tool definitions for at least the orchestrator, planner, and workflow-builder.
Settled Design Decisions
- Title generation is an internal OTel operation behind the internal-tracing
gate. When
N8N_INSTANCE_AI_TRACE_INTERNAL=trueorN8N_INSTANCE_AI_TRACE_INCLUDE_INTERNAL=true, Instance AI creates aninternal_operationroot and attaches native title-generation LLM telemetry. When the gate is off, the title call remains deliberately untraced to avoid accidental orphan roots and default thread noise. - Context compaction is a root-level preparation span under the current
message_turnororchestrator_resumeroot. It runs beforeagent.orchestratorand is not counted as orchestrator activation time. - Browser credential setup stays inline only for quick credential checks.
Browser flows that open a browser, wait for user action, or can run long use
the detached
background_subagentplusorchestrator_resumemodel. - Planned checkpoint follow-ups use
trace_kind=orchestrator_resumewithresume_reason=planned_checkpoint. We do not add a dedicatedplanned_checkpointroot kind.