n8n/packages/@n8n/instance-ai/docs/memory.md
2026-04-09 11:55:11 +00:00

5.4 KiB
Raw Permalink Blame History

Memory System

Overview

The memory system serves two purposes:

  • Operational context management — observational memory that compresses the agent's operational history during long autonomous loops to prevent context degradation (thread-scoped)
  • Conversation history — recent messages and semantic recall for the current thread (thread-scoped)

Sub-agents are stateless — context is passed via the briefing only.

Tiers

Tier 1: Storage Backend

The persistence layer. Stores all messages, observational memory, plan state, event history, and vector embeddings.

Backend When Used Connection
PostgreSQL n8n is configured with postgresdb Built from n8n's DB config
LibSQL/SQLite All other cases (default) file:instance-ai-memory.db

The storage backend is selected automatically based on n8n's database configuration — no separate config needed.

Tier 2: Recent Messages

A sliding window of the most recent N messages in the conversation, sent as context to the LLM on every request.

  • Default: 20 messages
  • Config: N8N_INSTANCE_AI_LAST_MESSAGES

Tier 3: Observational Memory

Automatic context compression for long-running autonomous loops. Two background agents manage the orchestrator's context size:

  • Observer — when message tokens exceed a threshold (default: 30K), compresses old messages into dense observations
  • Reflector — when observations exceed their threshold (default: 40K), condenses observations into higher-level patterns
Context window layout during autonomous loop:

┌──────────────────────────────────────────┐
│ Observation Block (≤40K tokens)          │  ← compressed history
│ "Built wf-123 with Schedule→HTTP→Slack.  │     (append-only, cacheable)
│  Exec failed: 401 on HTTP node.          │
│  Debugger identified missing API key.    │
│  Rebuilt workflow, re-executed, passed."  │
├──────────────────────────────────────────┤
│ Raw Message Block (≤30K tokens)          │  ← recent tool calls & results
│ [current step's tool calls and results]  │     (rotated as new messages arrive)
└──────────────────────────────────────────┘

Why this matters for the autonomous loop:

  • Tool-heavy workloads (workflow definitions, execution results, node descriptions) get 540x compression — a 50-step loop that would blow out the context window stays manageable
  • The observation block is append-only until reflection runs, enabling high prompt cache hit rates (410x cost reduction)
  • Async buffering pre-computes observations in the background — no user-visible pause when the threshold is hit
  • Uses a secondary LLM (default: google/gemini-2.5-flash) for compression — cheap and has a 1M token context window for the Reflector

Observational memory is thread-scoped — it tracks the operational history of the current task.

Tier 4: Semantic Recall (Optional)

Vector-based retrieval of relevant past messages. When enabled, the system embeds each message and retrieves semantically similar past messages to include as context.

  • Requires: N8N_INSTANCE_AI_EMBEDDER_MODEL to be set
  • Config: N8N_INSTANCE_AI_SEMANTIC_RECALL_TOP_K (default: 5)
  • Message range: 2 messages before and 1 after each match

Disabled by default. When the embedder model is not set, only tiers 13 are active.

Tier 5: Plan Storage

The plan tool stores execution plans in thread-scoped storage. Plans are structured data (goal, current phase, iteration count, step statuses) that persist across reconnects within a conversation. See the tools documentation for the plan tool schema.

Scoping Model

All memory is thread-scoped (isolated per conversation):

  • Recent messages — the sliding window of N messages
  • Observational memory — compressed operational history
  • Semantic recall — vector retrieval of relevant past messages
  • Plan — the current execution plan

Sub-agent memory

Sub-agents are fully stateless — context is passed via the briefing and conversationContext fields in the delegate and build-workflow-with-agent tools.

Past failed attempts are tracked via the IterationLog (stored in thread metadata) and appended to sub-agent briefings on retry, providing cross-attempt context without persistent memory.

Cross-user isolation

Each user's memory is fully independent. The agent cannot see other users' conversations or semantic history.

Configuration

Variable Type Default Description
N8N_INSTANCE_AI_LAST_MESSAGES number 20 Recent message window
N8N_INSTANCE_AI_EMBEDDER_MODEL string '' Embedder model (empty = disabled)
N8N_INSTANCE_AI_SEMANTIC_RECALL_TOP_K number 5 Number of semantic matches
N8N_INSTANCE_AI_OBSERVER_MODEL string google/gemini-2.5-flash LLM for Observer/Reflector
N8N_INSTANCE_AI_OBSERVER_MESSAGE_TOKENS number 30000 Observer trigger threshold
N8N_INSTANCE_AI_REFLECTOR_OBSERVATION_TOKENS number 40000 Reflector trigger threshold