github-mirror/n8n

Fork 0

mirror of https://github.com/n8n-io/n8n.git synced 2026-06-01 01:07:04 +02:00

Oleg Ivaniv b9092e4642

docs(instance-ai): record tracing scope decision

2026-05-05 13:07:26 +02:00

19 KiB

Raw Blame History

Instance AI Native Agents Rewrite

Status: Draft Last updated: 2026-05-05

Goal

Rewrite @n8n/instance-ai to run exclusively on the native n8n agents framework in packages/@n8n/agents.

This is a full runtime rewrite, not an incremental compatibility migration. Mastra must be removed from Instance AI entirely. Existing Instance AI runtime data does not need to be preserved. The migration may introduce fresh tables, drop obsolete Instance AI tables, or truncate existing Instance AI data if that produces a cleaner native agents persistence model.

The external product contract should remain stable: the chat API, SSE protocol, frontend reducer, task UI, HITL confirmations, and domain behavior should keep working unless a deliberate @n8n/api-types change is called out in this spec.

Hard Requirements

No @mastra/* imports in packages/@n8n/instance-ai.
No @mastra/* imports in packages/cli/src/modules/instance-ai.
Remove Mastra dependencies from packages/@n8n/instance-ai/package.json.
Orchestrator agents and sub-agents use @n8n/agents Agent.
Tools are rewritten to native @n8n/agents Tool definitions.
HITL uses native tool suspend() / resume() and CheckpointStore.
Streams use native StreamResult and StreamChunk.
Memory uses native BuiltMemory plus n8n TypeORM-backed storage.
Runtime checkpoints use a native TypeORM-backed CheckpointStore.
Workspace integration uses @n8n/agents Workspace, BaseFilesystem, and BaseSandbox, with concrete providers owned by Instance AI or moved into @n8n/agents if they become generally useful.
Model config uses @n8n/agents ModelConfig and AI SDK v6-compatible model instances.

Non-Goals

Preserve old Mastra chat messages, checkpoints, workflow snapshots, or observational memory.
Keep a Mastra-shaped tool compatibility adapter.
Recreate Mastra ToolSearchProcessor in the first rewrite.
Recreate Mastra observational memory in the first rewrite.
Migrate old semantic recall embeddings.

Target Architecture

graph TB
    UI[Frontend Chat UI] --> API[Instance AI REST API]
    UI --> SSE[SSE Event Stream]

    API --> Service[InstanceAiService]
    Service --> Factory[Native Agent Factory]
    Factory --> Orchestrator["@n8n/agents Agent: orchestrator"]

    Orchestrator --> Tools[Native Tool Registry]
    Orchestrator --> Memory[TypeORM BuiltMemory]
    Orchestrator --> Checkpoints[TypeORM CheckpointStore]
    Orchestrator --> Events[Native StreamChunk Mapper]

    Tools --> Domain[n8n Domain Services]
    Tools --> HITL[Native suspend/resume]
    Tools --> SubAgents[Native Sub-Agents]
    Tools --> Workspace[Native Workspace]
    Tools --> MCP[Native MCP Client or native dynamic tools]

    SubAgents --> Memoryless[Stateless Runs]
    SubAgents --> Events

    Events --> Bus[Instance AI Event Bus]
    Bus --> SSE

Instance AI remains the product orchestration package. @n8n/agents becomes the execution engine.

Cutover Strategy

The merged implementation should have one runtime path: native agents.

Development can use temporary local branches or intermediate commits, but the final branch should not contain:

a Mastra/native runtime feature flag
a Mastra fallback path
a Mastra-shaped tool adapter
storage abstractions whose only purpose is preserving old Mastra data

Because Instance AI runtime data may be reset, the cutover is a schema and code replacement. The migration should leave the database in a native-agents-ready state and the application should create fresh threads, messages, checkpoints, and run snapshots after upgrade.

Component Mapping

Current Instance AI area	Native agents target	Notes
`agent/instance-agent.ts`	`new Agent('n8n-instance-ai')`	Replace Mastra agent, Mastra registration, MCP client wiring, and tool search.
`agent/sub-agent-factory.ts`	Native `Agent`	Keep Instance AI `agentId` as product metadata outside the native agent name.
`agent/register-with-mastra.ts`	Delete	Native checkpoints replace Mastra workflow snapshot registration.
`tools/*`	Native `Tool` definitions	Rewrite directly with `new Tool(name)`. Shared helpers are allowed only for product concepts, not Mastra-shaped configs.
Tool HITL	`Tool.suspend(schema).resume(schema)`	Replace `ctx.agent.suspend` with native `ctx.suspend`; replace `ctx.agent.resumeData` with native `ctx.resumeData`.
Tool output shaping	`Tool.toModelOutput()` and `Tool.toMessage()`	Preserve truncation and custom message behavior where needed.
`tools/orchestration/delegate.tool.ts`	Native sub-agent streaming	`maxSteps` becomes `maxIterations`; no Mastra registration.
`tools/orchestration/build-workflow-agent.tool.ts`	Native builder agent	Builder sub-runs use native stream and HITL resume.
`runtime/stream-runner.ts`	Native stream runner	Use `BuiltAgent.stream()` and `BuiltAgent.resume('stream', ...)`.
`runtime/resumable-stream-executor.ts`	Native stream executor	Read `ReadableStream<StreamChunk>`, accumulate text deltas, publish canonical Instance AI events.
`stream/map-chunk.ts`	Native chunk mapper	Map native `StreamChunk` to `InstanceAiEvent`.
`memory/memory-config.ts`	Native `Memory` builder	Use `BuiltMemory`, `lastMessages`, optional semantic recall, no working memory initially.
`storage/*`	Native storage abstractions	Replace Mastra-memory-backed metadata storage with TypeORM-backed services.
`workspace/*`	Native `Workspace` providers	Rewrite local, Daytona, and n8n-sandbox adapters against agents workspace interfaces.
`mcp/*`	Native `McpClient` or dynamic native tools	Keep schema sanitization only where still required by providers.
`compaction/*`	Native `Agent.generate()`	Keep operational summarization, but do not depend on observational memory.
`tracing/*`	Native telemetry plus Instance AI spans	Prefer `@n8n/agents` telemetry and events; keep product-level LangSmith spans if they provide missing detail.

Native Type Surface

Replace Mastra-facing types in src/types.ts with native types.

import type {
	BuiltMemory,
	BuiltTool,
	CheckpointStore,
	ModelConfig,
	Workspace,
} from '@n8n/agents';

export type InstanceAiToolRegistry = Record<string, BuiltTool>;

export interface InstanceAiMemoryConfig {
	memory: BuiltMemory;
	checkpointStore: CheckpointStore;
	embedderModel?: string;
	lastMessages?: number;
	semanticRecallTopK?: number;
	threadTtlDays?: number;
}

OrchestrationContext should use:

modelId: ModelConfig
domainTools: InstanceAiToolRegistry
mcpTools?: InstanceAiToolRegistry
workspace?: Workspace
no storage: MastraCompositeStore
no Mastra Memory
no Mastra ToolsInput

Sub-agents are stateless. If they need conversation context, pass explicit text or a small ThreadMessageReader service, not a memory instance that persists sub-agent turns.

Tool Rewrite Rules

All tools should be native tools.

const askUserTool = new Tool('ask-user')
	.description('Ask the user for information needed to continue.')
	.input(askUserInputSchema)
	.output(askUserOutputSchema)
	.suspend(askUserSuspendSchema)
	.resume(askUserResumeSchema)
	.handler(async (input, ctx) => {
		if (ctx.resumeData === undefined) {
			return await ctx.suspend(buildAskUserPayload(input));
		}

		return normalizeAskUserResume(ctx.resumeData);
	});

Rules:

Keep existing tool names unless there is a deliberate prompt and event migration.
Keep Zod schemas as the source of truth.
Use ctx.suspend() for all confirmations, questions, plan reviews, credential flows, domain access decisions, and resource decisions.
Use ctx.resumeData for resumed tool execution.
Use toModelOutput() for large or sensitive tool results.
Use toMessage() only when Instance AI needs a custom stored message.
Do not add a helper that accepts Mastra-style { id, inputSchema, execute } configs.

Stream And Event Mapping

Instance AI keeps its canonical event protocol. Only the runtime chunk source changes.

Native `StreamChunk`	Instance AI event
`text-delta`	`text-delta`
`reasoning-delta`	`reasoning-delta`
`message` with `tool-call` content	`tool-call`
`message` with `tool-result` content	`tool-result` or `tool-error`
`tool-call-suspended`	`confirmation-request`
`finish`	Usage accounting and run finalization
`error`	`error` and failed run finalization

The public Instance AI runId and native agents runId are different concepts:

Instance AI runId: product-level run identifier used by SSE and UI state.
Native agentRunId: native agents checkpoint/resume identifier.

Rename all internal mastraRunId fields to agentRunId.

Persistence Model

Data loss for existing Instance AI runtime data is acceptable. The rewrite should use fresh native agents persistence and avoid translating old Mastra records.

Tables To Keep Or Recreate

Table	Target use
`instance_ai_threads`	Native `BuiltMemory` threads. Keep `id`, `resourceId`, `title`, `metadata`, timestamps.
`instance_ai_messages`	Native `AgentDbMessage` persistence. Recreate or alter as needed.
`instance_ai_resources`	Optional native working memory storage. Working memory remains disabled initially.
`instance_ai_run_snapshots`	UI agent tree snapshots. Recreate empty if the migration resets all Instance AI data.
`instance_ai_iteration_logs`	Workflow-loop attempt history. Recreate empty if the migration resets all Instance AI data.
`ai_builder_temporary_workflow`	Keep behavior, but account for its FK to `instance_ai_threads` during destructive migrations.

Tables To Remove Or Stop Using

Table	Replacement
`instance_ai_workflow_snapshots`	New native `instance_ai_checkpoints` table.
`instance_ai_observational_memory`	No replacement in the first rewrite. Keep compaction instead.

New Tables

`instance_ai_checkpoints`

Backs @n8n/agents CheckpointStore.

Column	Type	Notes
`key`	varchar primary	Opaque key passed by native agents.
`runId`	varchar indexed	Native `agentRunId` when available.
`threadId`	uuid nullable indexed	Useful for cleanup and debugging.
`resourceId`	varchar nullable indexed	User/resource owner.
`state`	text or json	Serialized `SerializableAgentState`.
timestamps	standard	Required for TTL cleanup.

load(key) must be atomic enough that two concurrent resume attempts cannot both receive and execute the same checkpoint. Prefer transaction plus row lock where supported, followed by delete.

`instance_ai_embeddings`

Optional table for native semantic recall. This can be skipped in the first implementation if semantic recall is disabled until a vector strategy is chosen.

Column	Type	Notes
`id`	varchar primary	Embedding row id.
`messageId`	varchar indexed	Associated message.
`threadId`	uuid nullable indexed	Thread scope.
`resourceId`	varchar nullable indexed	Resource scope.
`scope`	varchar	`thread` or `resource`.
`model`	varchar	Embedder model id.
`text`	text	Source text used for embedding.
`vector`	text or blob	JSON vector unless a DB-native vector type is introduced.

Migration Strategy

Add a new common DB migration for the native agents rewrite.

The migration may reset Instance AI data. It must not affect non-Instance-AI tables.

Recommended approach:

Delete rows from ai_builder_temporary_workflow to release thread FKs.
Delete rows from existing Instance AI runtime tables.
Drop obsolete Instance AI tables:
- instance_ai_workflow_snapshots
- instance_ai_observational_memory
Recreate or alter instance_ai_messages for native AgentDbMessage storage.
Create instance_ai_checkpoints.
Optionally create instance_ai_embeddings.
Keep or recreate instance_ai_run_snapshots and instance_ai_iteration_logs empty.

No backfill is required. Old chats, suspended runs, observations, and semantic recall state may be lost.

Backend Storage Adapters

Implement these in packages/cli/src/modules/instance-ai/storage/:

TypeORMAgentMemory implements BuiltMemory
TypeORMAgentCheckpointStore implements CheckpointStore
Optional TypeORMAgentEmbeddingStore if semantic recall ships in the first rewrite

Delete or retire:

TypeORMCompositeStore
TypeORMWorkflowsStorage
Mastra memory storage inheritance

Thread metadata remains the storage location for:

task state
planned task graph
workflow loop state
compaction metadata

Use a small native interface for metadata access instead of passing memory as a Mastra object:

export interface ThreadMetadataStore {
	get(threadId: string): Promise<Record<string, unknown>>;
	patch(threadId: string, patch: Record<string, unknown>): Promise<void>;
}

Workspace Rewrite

@n8n/agents provides workspace interfaces and base classes, not the concrete Instance AI providers.

Rewrite:

Daytona filesystem and sandbox provider
local filesystem and sandbox provider
n8n-sandbox filesystem and sandbox provider
builder sandbox factory
workspace snapshot restore flow

The first rewrite may keep concrete providers in @n8n/instance-ai. Move them to @n8n/agents only if they become generic SDK functionality.

MCP Rewrite

External MCP servers should use native McpClient when possible.

Local gateway tools and dynamically discovered tools should become native BuiltTools. JSON Schema input from MCP should continue to be converted to Zod before being registered as native tools.

Tracing

Use native agents telemetry and lifecycle events as the default integration point.

Decision for this rewrite: keep the existing Instance AI LangSmith integration as product-level tracing. @n8n/agents exposes generic OTel/LangSmith telemetry for model and tool spans, but Instance AI still needs trace structure that is specific to the product surface:

service-proxy auth and direct/proxy client selection
feedback anchoring to persisted run snapshots
deterministic trace replay for e2e recording
detached sub-agent roots linked to the parent product trace
background task and workflow build-loop spans
ad-hoc child spans under the active Instance AI run tree

Revisit native agents telemetry when it can accept or inherit these product trace anchors without creating duplicate LangSmith traces.

Keep Instance AI product-level LangSmith spans only where native telemetry does not expose enough structure for:

product run
orchestrator turn
sub-agent run
tool execution
background task
workflow build loop

Testing Plan

Unit tests:

native tool definitions for HITL tools
native stream chunk to Instance AI event mapping
native resume flow with checkpoint persistence
TypeORM BuiltMemory
TypeORM CheckpointStore
thread metadata storage
service run state registry using agentRunId

Integration tests:

orchestrator run streams text and tool events
HITL suspend, confirmation, resume
delegated sub-agent run
planned task approval and execution
builder agent background task
workspace filesystem and sandbox operations

Package checks:

pushd packages/@n8n/agents
pnpm test
pnpm typecheck
popd

pushd packages/@n8n/instance-ai
pnpm test
pnpm typecheck
popd

pushd packages/cli
pnpm test instance-ai
pnpm typecheck
popd

Run broader checks before PR:

pnpm test:affected
pnpm typecheck

Implementation Milestones

Native type surface and dependency removal plan
- Replace Mastra imports in public Instance AI types.
- Define native tool registry, memory config, checkpoint config, and orchestration context.
- Update docs and prompts that refer to Mastra runtime behavior.
Native persistence
- Add native TypeORM entities and migration.
- Implement BuiltMemory.
- Implement CheckpointStore.
- Replace thread metadata storage that currently depends on Mastra memory.
Native tools
- Rewrite domain tools directly with Tool.
- Rewrite HITL tools with native suspend/resume.
- Rewrite dynamic local gateway and MCP-discovered tools as native tools.
Native agent factories and runtime
- Rewrite orchestrator and sub-agent factories.
- Rewrite stream runner, resume flow, and event mapper.
- Rename mastraRunId to agentRunId.
Native workspace, MCP, and tracing
- Rewrite workspace providers.
- Replace Mastra MCP usage.
- Evaluate tracing against native telemetry/events and keep product-level spans where native telemetry lacks Instance AI anchors.
Cleanup and verification
- Delete retired Mastra storage and registration files.
- Remove Mastra dependencies.
- Run targeted tests and typechecks.
- Verify no @mastra/* imports remain.

Implementation TODO

Replace Mastra types in packages/@n8n/instance-ai/src/types.ts.
Remove Mastra dependencies from packages/@n8n/instance-ai/package.json.
Rewrite orchestrator factory with native Agent.
Rewrite sub-agent factory with native Agent.
Delete Mastra registration logic.
Rewrite all tools to native Tool.
Rewrite orchestration tools for native streaming and resume.
Rewrite stream runner and stream executor for native StreamChunk.
Rename internal mastraRunId to agentRunId.
Implement native chunk to Instance AI event mapper.
Implement TypeORM BuiltMemory.
Implement TypeORM CheckpointStore.
Add fresh DB migration for native Instance AI persistence.
Remove TypeORMCompositeStore from the active runtime path.
Remove TypeORMWorkflowsStorage from the active runtime path.
Remove observational memory runtime usage.
Keep or replace compaction as the operational long-context mechanism.
Rewrite workspace providers against native agents workspace interfaces.
Replace Mastra MCP usage with native MCP or native dynamic tools.
Decide LangSmith tracing scope against native telemetry/events.
Update docs that mention Mastra runtime behavior.
Add tests for the native runtime path.
Verify no @mastra/* imports remain.

Current remaining implementation gaps:

None known in the native agents rewrite scope.

Acceptance Criteria

rg "@mastra/" packages/@n8n/instance-ai packages/cli/src/modules/instance-ai returns no active runtime imports.
packages/@n8n/instance-ai/package.json has no Mastra dependencies.
A new Instance AI thread can run from user message to streamed final answer.
HITL confirmation suspends and resumes through native CheckpointStore.
Delegated sub-agents run on native Agent.
Planned tasks and workflow loop still persist thread-scoped state.
Existing Instance AI frontend can consume the SSE stream without reducer changes.
Old Instance AI data may be absent after migration, but non-Instance-AI data is untouched.

19 KiB Raw Blame History