n8n/packages/@n8n/instance-ai/SPECS.md
2026-05-05 13:07:26 +02:00

508 lines
19 KiB
Markdown

# Instance AI Native Agents Rewrite
Status: Draft
Last updated: 2026-05-05
## Goal
Rewrite `@n8n/instance-ai` to run exclusively on the native n8n agents
framework in `packages/@n8n/agents`.
This is a full runtime rewrite, not an incremental compatibility migration.
Mastra must be removed from Instance AI entirely. Existing Instance AI runtime
data does not need to be preserved. The migration may introduce fresh tables,
drop obsolete Instance AI tables, or truncate existing Instance AI data if that
produces a cleaner native agents persistence model.
The external product contract should remain stable: the chat API, SSE protocol,
frontend reducer, task UI, HITL confirmations, and domain behavior should keep
working unless a deliberate `@n8n/api-types` change is called out in this spec.
## Hard Requirements
- No `@mastra/*` imports in `packages/@n8n/instance-ai`.
- No `@mastra/*` imports in `packages/cli/src/modules/instance-ai`.
- Remove Mastra dependencies from `packages/@n8n/instance-ai/package.json`.
- Orchestrator agents and sub-agents use `@n8n/agents` `Agent`.
- Tools are rewritten to native `@n8n/agents` `Tool` definitions.
- HITL uses native tool `suspend()` / `resume()` and `CheckpointStore`.
- Streams use native `StreamResult` and `StreamChunk`.
- Memory uses native `BuiltMemory` plus n8n TypeORM-backed storage.
- Runtime checkpoints use a native TypeORM-backed `CheckpointStore`.
- Workspace integration uses `@n8n/agents` `Workspace`, `BaseFilesystem`, and
`BaseSandbox`, with concrete providers owned by Instance AI or moved into
`@n8n/agents` if they become generally useful.
- Model config uses `@n8n/agents` `ModelConfig` and AI SDK v6-compatible model
instances.
## Non-Goals
- Preserve old Mastra chat messages, checkpoints, workflow snapshots, or
observational memory.
- Keep a Mastra-shaped tool compatibility adapter.
- Recreate Mastra `ToolSearchProcessor` in the first rewrite.
- Recreate Mastra observational memory in the first rewrite.
- Migrate old semantic recall embeddings.
## Target Architecture
```mermaid
graph TB
UI[Frontend Chat UI] --> API[Instance AI REST API]
UI --> SSE[SSE Event Stream]
API --> Service[InstanceAiService]
Service --> Factory[Native Agent Factory]
Factory --> Orchestrator["@n8n/agents Agent: orchestrator"]
Orchestrator --> Tools[Native Tool Registry]
Orchestrator --> Memory[TypeORM BuiltMemory]
Orchestrator --> Checkpoints[TypeORM CheckpointStore]
Orchestrator --> Events[Native StreamChunk Mapper]
Tools --> Domain[n8n Domain Services]
Tools --> HITL[Native suspend/resume]
Tools --> SubAgents[Native Sub-Agents]
Tools --> Workspace[Native Workspace]
Tools --> MCP[Native MCP Client or native dynamic tools]
SubAgents --> Memoryless[Stateless Runs]
SubAgents --> Events
Events --> Bus[Instance AI Event Bus]
Bus --> SSE
```
Instance AI remains the product orchestration package. `@n8n/agents` becomes
the execution engine.
## Cutover Strategy
The merged implementation should have one runtime path: native agents.
Development can use temporary local branches or intermediate commits, but the
final branch should not contain:
- a Mastra/native runtime feature flag
- a Mastra fallback path
- a Mastra-shaped tool adapter
- storage abstractions whose only purpose is preserving old Mastra data
Because Instance AI runtime data may be reset, the cutover is a schema and code
replacement. The migration should leave the database in a native-agents-ready
state and the application should create fresh threads, messages, checkpoints,
and run snapshots after upgrade.
## Component Mapping
| Current Instance AI area | Native agents target | Notes |
| --- | --- | --- |
| `agent/instance-agent.ts` | `new Agent('n8n-instance-ai')` | Replace Mastra agent, Mastra registration, MCP client wiring, and tool search. |
| `agent/sub-agent-factory.ts` | Native `Agent` | Keep Instance AI `agentId` as product metadata outside the native agent name. |
| `agent/register-with-mastra.ts` | Delete | Native checkpoints replace Mastra workflow snapshot registration. |
| `tools/*` | Native `Tool` definitions | Rewrite directly with `new Tool(name)`. Shared helpers are allowed only for product concepts, not Mastra-shaped configs. |
| Tool HITL | `Tool.suspend(schema).resume(schema)` | Replace `ctx.agent.suspend` with native `ctx.suspend`; replace `ctx.agent.resumeData` with native `ctx.resumeData`. |
| Tool output shaping | `Tool.toModelOutput()` and `Tool.toMessage()` | Preserve truncation and custom message behavior where needed. |
| `tools/orchestration/delegate.tool.ts` | Native sub-agent streaming | `maxSteps` becomes `maxIterations`; no Mastra registration. |
| `tools/orchestration/build-workflow-agent.tool.ts` | Native builder agent | Builder sub-runs use native stream and HITL resume. |
| `runtime/stream-runner.ts` | Native stream runner | Use `BuiltAgent.stream()` and `BuiltAgent.resume('stream', ...)`. |
| `runtime/resumable-stream-executor.ts` | Native stream executor | Read `ReadableStream<StreamChunk>`, accumulate text deltas, publish canonical Instance AI events. |
| `stream/map-chunk.ts` | Native chunk mapper | Map native `StreamChunk` to `InstanceAiEvent`. |
| `memory/memory-config.ts` | Native `Memory` builder | Use `BuiltMemory`, `lastMessages`, optional semantic recall, no working memory initially. |
| `storage/*` | Native storage abstractions | Replace Mastra-memory-backed metadata storage with TypeORM-backed services. |
| `workspace/*` | Native `Workspace` providers | Rewrite local, Daytona, and n8n-sandbox adapters against agents workspace interfaces. |
| `mcp/*` | Native `McpClient` or dynamic native tools | Keep schema sanitization only where still required by providers. |
| `compaction/*` | Native `Agent.generate()` | Keep operational summarization, but do not depend on observational memory. |
| `tracing/*` | Native telemetry plus Instance AI spans | Prefer `@n8n/agents` telemetry and events; keep product-level LangSmith spans if they provide missing detail. |
## Native Type Surface
Replace Mastra-facing types in `src/types.ts` with native types.
```ts
import type {
BuiltMemory,
BuiltTool,
CheckpointStore,
ModelConfig,
Workspace,
} from '@n8n/agents';
export type InstanceAiToolRegistry = Record<string, BuiltTool>;
export interface InstanceAiMemoryConfig {
memory: BuiltMemory;
checkpointStore: CheckpointStore;
embedderModel?: string;
lastMessages?: number;
semanticRecallTopK?: number;
threadTtlDays?: number;
}
```
`OrchestrationContext` should use:
- `modelId: ModelConfig`
- `domainTools: InstanceAiToolRegistry`
- `mcpTools?: InstanceAiToolRegistry`
- `workspace?: Workspace`
- no `storage: MastraCompositeStore`
- no Mastra `Memory`
- no Mastra `ToolsInput`
Sub-agents are stateless. If they need conversation context, pass explicit text
or a small `ThreadMessageReader` service, not a memory instance that persists
sub-agent turns.
## Tool Rewrite Rules
All tools should be native tools.
```ts
const askUserTool = new Tool('ask-user')
.description('Ask the user for information needed to continue.')
.input(askUserInputSchema)
.output(askUserOutputSchema)
.suspend(askUserSuspendSchema)
.resume(askUserResumeSchema)
.handler(async (input, ctx) => {
if (ctx.resumeData === undefined) {
return await ctx.suspend(buildAskUserPayload(input));
}
return normalizeAskUserResume(ctx.resumeData);
});
```
Rules:
- Keep existing tool names unless there is a deliberate prompt and event
migration.
- Keep Zod schemas as the source of truth.
- Use `ctx.suspend()` for all confirmations, questions, plan reviews,
credential flows, domain access decisions, and resource decisions.
- Use `ctx.resumeData` for resumed tool execution.
- Use `toModelOutput()` for large or sensitive tool results.
- Use `toMessage()` only when Instance AI needs a custom stored message.
- Do not add a helper that accepts Mastra-style `{ id, inputSchema, execute }`
configs.
## Stream And Event Mapping
Instance AI keeps its canonical event protocol. Only the runtime chunk source
changes.
| Native `StreamChunk` | Instance AI event |
| --- | --- |
| `text-delta` | `text-delta` |
| `reasoning-delta` | `reasoning-delta` |
| `message` with `tool-call` content | `tool-call` |
| `message` with `tool-result` content | `tool-result` or `tool-error` |
| `tool-call-suspended` | `confirmation-request` |
| `finish` | Usage accounting and run finalization |
| `error` | `error` and failed run finalization |
The public Instance AI `runId` and native agents `runId` are different
concepts:
- Instance AI `runId`: product-level run identifier used by SSE and UI state.
- Native `agentRunId`: native agents checkpoint/resume identifier.
Rename all internal `mastraRunId` fields to `agentRunId`.
## Persistence Model
Data loss for existing Instance AI runtime data is acceptable. The rewrite
should use fresh native agents persistence and avoid translating old Mastra
records.
### Tables To Keep Or Recreate
| Table | Target use |
| --- | --- |
| `instance_ai_threads` | Native `BuiltMemory` threads. Keep `id`, `resourceId`, `title`, `metadata`, timestamps. |
| `instance_ai_messages` | Native `AgentDbMessage` persistence. Recreate or alter as needed. |
| `instance_ai_resources` | Optional native working memory storage. Working memory remains disabled initially. |
| `instance_ai_run_snapshots` | UI agent tree snapshots. Recreate empty if the migration resets all Instance AI data. |
| `instance_ai_iteration_logs` | Workflow-loop attempt history. Recreate empty if the migration resets all Instance AI data. |
| `ai_builder_temporary_workflow` | Keep behavior, but account for its FK to `instance_ai_threads` during destructive migrations. |
### Tables To Remove Or Stop Using
| Table | Replacement |
| --- | --- |
| `instance_ai_workflow_snapshots` | New native `instance_ai_checkpoints` table. |
| `instance_ai_observational_memory` | No replacement in the first rewrite. Keep compaction instead. |
### New Tables
#### `instance_ai_checkpoints`
Backs `@n8n/agents` `CheckpointStore`.
| Column | Type | Notes |
| --- | --- | --- |
| `key` | varchar primary | Opaque key passed by native agents. |
| `runId` | varchar indexed | Native `agentRunId` when available. |
| `threadId` | uuid nullable indexed | Useful for cleanup and debugging. |
| `resourceId` | varchar nullable indexed | User/resource owner. |
| `state` | text or json | Serialized `SerializableAgentState`. |
| timestamps | standard | Required for TTL cleanup. |
`load(key)` must be atomic enough that two concurrent resume attempts cannot
both receive and execute the same checkpoint. Prefer transaction plus row lock
where supported, followed by delete.
#### `instance_ai_embeddings`
Optional table for native semantic recall. This can be skipped in the first
implementation if semantic recall is disabled until a vector strategy is chosen.
| Column | Type | Notes |
| --- | --- | --- |
| `id` | varchar primary | Embedding row id. |
| `messageId` | varchar indexed | Associated message. |
| `threadId` | uuid nullable indexed | Thread scope. |
| `resourceId` | varchar nullable indexed | Resource scope. |
| `scope` | varchar | `thread` or `resource`. |
| `model` | varchar | Embedder model id. |
| `text` | text | Source text used for embedding. |
| `vector` | text or blob | JSON vector unless a DB-native vector type is introduced. |
## Migration Strategy
Add a new common DB migration for the native agents rewrite.
The migration may reset Instance AI data. It must not affect non-Instance-AI
tables.
Recommended approach:
1. Delete rows from `ai_builder_temporary_workflow` to release thread FKs.
2. Delete rows from existing Instance AI runtime tables.
3. Drop obsolete Instance AI tables:
- `instance_ai_workflow_snapshots`
- `instance_ai_observational_memory`
4. Recreate or alter `instance_ai_messages` for native `AgentDbMessage`
storage.
5. Create `instance_ai_checkpoints`.
6. Optionally create `instance_ai_embeddings`.
7. Keep or recreate `instance_ai_run_snapshots` and `instance_ai_iteration_logs`
empty.
No backfill is required. Old chats, suspended runs, observations, and semantic
recall state may be lost.
## Backend Storage Adapters
Implement these in `packages/cli/src/modules/instance-ai/storage/`:
- `TypeORMAgentMemory implements BuiltMemory`
- `TypeORMAgentCheckpointStore implements CheckpointStore`
- Optional `TypeORMAgentEmbeddingStore` if semantic recall ships in the first
rewrite
Delete or retire:
- `TypeORMCompositeStore`
- `TypeORMWorkflowsStorage`
- Mastra memory storage inheritance
Thread metadata remains the storage location for:
- task state
- planned task graph
- workflow loop state
- compaction metadata
Use a small native interface for metadata access instead of passing memory as a
Mastra object:
```ts
export interface ThreadMetadataStore {
get(threadId: string): Promise<Record<string, unknown>>;
patch(threadId: string, patch: Record<string, unknown>): Promise<void>;
}
```
## Workspace Rewrite
`@n8n/agents` provides workspace interfaces and base classes, not the concrete
Instance AI providers.
Rewrite:
- Daytona filesystem and sandbox provider
- local filesystem and sandbox provider
- n8n-sandbox filesystem and sandbox provider
- builder sandbox factory
- workspace snapshot restore flow
The first rewrite may keep concrete providers in `@n8n/instance-ai`. Move them
to `@n8n/agents` only if they become generic SDK functionality.
## MCP Rewrite
External MCP servers should use native `McpClient` when possible.
Local gateway tools and dynamically discovered tools should become native
`BuiltTool`s. JSON Schema input from MCP should continue to be converted to Zod
before being registered as native tools.
## Tracing
Use native agents telemetry and lifecycle events as the default integration
point.
Decision for this rewrite: keep the existing Instance AI LangSmith integration
as product-level tracing. `@n8n/agents` exposes generic OTel/LangSmith telemetry
for model and tool spans, but Instance AI still needs trace structure that is
specific to the product surface:
- service-proxy auth and direct/proxy client selection
- feedback anchoring to persisted run snapshots
- deterministic trace replay for e2e recording
- detached sub-agent roots linked to the parent product trace
- background task and workflow build-loop spans
- ad-hoc child spans under the active Instance AI run tree
Revisit native agents telemetry when it can accept or inherit these product
trace anchors without creating duplicate LangSmith traces.
Keep Instance AI product-level LangSmith spans only where native telemetry does
not expose enough structure for:
- product run
- orchestrator turn
- sub-agent run
- tool execution
- background task
- workflow build loop
## Testing Plan
Unit tests:
- native tool definitions for HITL tools
- native stream chunk to Instance AI event mapping
- native resume flow with checkpoint persistence
- TypeORM `BuiltMemory`
- TypeORM `CheckpointStore`
- thread metadata storage
- service run state registry using `agentRunId`
Integration tests:
- orchestrator run streams text and tool events
- HITL suspend, confirmation, resume
- delegated sub-agent run
- planned task approval and execution
- builder agent background task
- workspace filesystem and sandbox operations
Package checks:
```bash
pushd packages/@n8n/agents
pnpm test
pnpm typecheck
popd
pushd packages/@n8n/instance-ai
pnpm test
pnpm typecheck
popd
pushd packages/cli
pnpm test instance-ai
pnpm typecheck
popd
```
Run broader checks before PR:
```bash
pnpm test:affected
pnpm typecheck
```
## Implementation Milestones
1. Native type surface and dependency removal plan
- Replace Mastra imports in public Instance AI types.
- Define native tool registry, memory config, checkpoint config, and
orchestration context.
- Update docs and prompts that refer to Mastra runtime behavior.
2. Native persistence
- Add native TypeORM entities and migration.
- Implement `BuiltMemory`.
- Implement `CheckpointStore`.
- Replace thread metadata storage that currently depends on Mastra memory.
3. Native tools
- Rewrite domain tools directly with `Tool`.
- Rewrite HITL tools with native suspend/resume.
- Rewrite dynamic local gateway and MCP-discovered tools as native tools.
4. Native agent factories and runtime
- Rewrite orchestrator and sub-agent factories.
- Rewrite stream runner, resume flow, and event mapper.
- Rename `mastraRunId` to `agentRunId`.
5. Native workspace, MCP, and tracing
- Rewrite workspace providers.
- Replace Mastra MCP usage.
- Evaluate tracing against native telemetry/events and keep product-level
spans where native telemetry lacks Instance AI anchors.
6. Cleanup and verification
- Delete retired Mastra storage and registration files.
- Remove Mastra dependencies.
- Run targeted tests and typechecks.
- Verify no `@mastra/*` imports remain.
## Implementation TODO
- [x] Replace Mastra types in `packages/@n8n/instance-ai/src/types.ts`.
- [x] Remove Mastra dependencies from `packages/@n8n/instance-ai/package.json`.
- [x] Rewrite orchestrator factory with native `Agent`.
- [x] Rewrite sub-agent factory with native `Agent`.
- [x] Delete Mastra registration logic.
- [x] Rewrite all tools to native `Tool`.
- [x] Rewrite orchestration tools for native streaming and resume.
- [x] Rewrite stream runner and stream executor for native `StreamChunk`.
- [x] Rename internal `mastraRunId` to `agentRunId`.
- [x] Implement native chunk to Instance AI event mapper.
- [x] Implement TypeORM `BuiltMemory`.
- [x] Implement TypeORM `CheckpointStore`.
- [x] Add fresh DB migration for native Instance AI persistence.
- [x] Remove `TypeORMCompositeStore` from the active runtime path.
- [x] Remove `TypeORMWorkflowsStorage` from the active runtime path.
- [x] Remove observational memory runtime usage.
- [x] Keep or replace compaction as the operational long-context mechanism.
- [x] Rewrite workspace providers against native agents workspace interfaces.
- [x] Replace Mastra MCP usage with native MCP or native dynamic tools.
- [x] Decide LangSmith tracing scope against native telemetry/events.
- [x] Update docs that mention Mastra runtime behavior.
- [x] Add tests for the native runtime path.
- [x] Verify no `@mastra/*` imports remain.
Current remaining implementation gaps:
- None known in the native agents rewrite scope.
## Acceptance Criteria
- `rg "@mastra/" packages/@n8n/instance-ai packages/cli/src/modules/instance-ai`
returns no active runtime imports.
- `packages/@n8n/instance-ai/package.json` has no Mastra dependencies.
- A new Instance AI thread can run from user message to streamed final answer.
- HITL confirmation suspends and resumes through native `CheckpointStore`.
- Delegated sub-agents run on native `Agent`.
- Planned tasks and workflow loop still persist thread-scoped state.
- Existing Instance AI frontend can consume the SSE stream without reducer
changes.
- Old Instance AI data may be absent after migration, but non-Instance-AI data
is untouched.