# Memory System ## Overview The memory system serves two distinct purposes: - **Long-term user knowledge** — working memory that persists the agent's understanding of the user, their preferences, and instance knowledge across all conversations (user-scoped) - **Operational context management** — observational memory that compresses the agent's operational history during long autonomous loops to prevent context degradation (thread-scoped) - **Conversation history** — recent messages and semantic recall for the current thread (thread-scoped) Sub-agents currently have working memory **disabled** (`workingMemoryEnabled: false`). They are stateless — context is passed via the briefing only. ## Tiers ### Tier 1: Storage Backend The persistence layer. Stores all messages, working memory state, observational memory, plan state, event history, and vector embeddings. | Backend | When Used | Connection | |---------|-----------|------------| | PostgreSQL | n8n is configured with `postgresdb` | Built from n8n's DB config | | LibSQL/SQLite | All other cases (default) | `file:instance-ai-memory.db` | The storage backend is selected automatically based on n8n's database configuration — no separate config needed. ### Tier 2: Recent Messages A sliding window of the most recent N messages in the conversation, sent as context to the LLM on every request. - **Default**: 20 messages - **Config**: `N8N_INSTANCE_AI_LAST_MESSAGES` ### Tier 3: Working Memory A structured markdown template that the agent can update during conversation. It persists information the agent learns about the user and their instance across messages. Working memory is **user-scoped** — it carries across threads. ```markdown # User Context - **Name**: - **Role**: - **Organization**: # Workflow Preferences - **Preferred trigger types**: - **Common integrations used**: - **Workflow naming conventions**: - **Error handling patterns**: # Current Goals - **Active project/task**: - **Known issues being debugged**: - **Pending workflow changes**: # Instance Knowledge - **Frequently used credentials**: - **Key workflow IDs and names**: - **Custom node types available**: ``` The agent fills this in over time as it learns about the user. Working memory is included in every request, giving the agent persistent context beyond the recent message window. ### Tier 4: Observational Memory Automatic context compression for long-running autonomous loops. Two background agents manage the orchestrator's context size: - **Observer** — when message tokens exceed a threshold (default: 30K), compresses old messages into dense observations - **Reflector** — when observations exceed their threshold (default: 40K), condenses observations into higher-level patterns ``` Context window layout during autonomous loop: ┌──────────────────────────────────────────┐ │ Observation Block (≤40K tokens) │ ← compressed history │ "Built wf-123 with Schedule→HTTP→Slack. │ (append-only, cacheable) │ Exec failed: 401 on HTTP node. │ │ Debugger identified missing API key. │ │ Rebuilt workflow, re-executed, passed." │ ├──────────────────────────────────────────┤ │ Raw Message Block (≤30K tokens) │ ← recent tool calls & results │ [current step's tool calls and results] │ (rotated as new messages arrive) └──────────────────────────────────────────┘ ``` **Why this matters for the autonomous loop**: - Tool-heavy workloads (workflow definitions, execution results, node descriptions) get **5–40x compression** — a 50-step loop that would blow out the context window stays manageable - The observation block is **append-only** until reflection runs, enabling high prompt cache hit rates (4–10x cost reduction) - **Async buffering** pre-computes observations in the background — no user-visible pause when the threshold is hit - Uses a secondary LLM (default: `google/gemini-2.5-flash`) for compression — cheap and has a 1M token context window for the Reflector Observational memory is **thread-scoped** — it tracks the operational history of the current task, not long-term user knowledge (that's working memory's job). ### Tier 5: Semantic Recall (Optional) Vector-based retrieval of relevant past messages. When enabled, the system embeds each message and retrieves semantically similar past messages to include as context. - **Requires**: `N8N_INSTANCE_AI_EMBEDDER_MODEL` to be set - **Config**: `N8N_INSTANCE_AI_SEMANTIC_RECALL_TOP_K` (default: 5) - **Message range**: 2 messages before and 1 after each match Disabled by default. When the embedder model is not set, only tiers 1–4 are active. ### Tier 6: Plan Storage The `plan` tool stores execution plans in thread-scoped storage. Plans are structured data (goal, current phase, iteration count, step statuses) that persist across reconnects within a conversation. See the [tools](./tools.md) documentation for the plan tool schema. ## Scoping Model Memory is scoped to two dimensions: ```typescript agent.stream(message, { memory: { resource: userId, // User-level — working memory lives here thread: threadId, // Thread-level — messages, observations, plan live here }, }); ``` ### What's user-scoped (persists across threads) - **Working memory** — the agent's accumulated understanding of the user (preferences, frequently used workflows, instance knowledge) ### What's thread-scoped (isolated per conversation) - **Recent messages** — the sliding window of N messages - **Observational memory** — compressed operational history - **Semantic recall** — vector retrieval of relevant past messages - **Plan** — the current execution plan ### Sub-agent memory Sub-agents currently have working memory **disabled**. They are fully stateless — context is passed via the briefing and `conversationContext` fields in the `delegate` and `build-workflow-with-agent` tools. Past failed attempts are tracked via the `IterationLog` (stored in thread metadata) and appended to sub-agent briefings on retry, providing cross-attempt context without persistent memory. ### Cross-user isolation Each user's memory is fully independent. The agent cannot see other users' conversations, working memory, or semantic history. ## Memory vs. Observational Memory These serve different purposes and both are active simultaneously: | Aspect | Working Memory | Observational Memory | |--------|---------------|---------------------| | **Scope** | User-scoped | Thread-scoped | | **Content** | User preferences, instance knowledge | Compressed operational history | | **Lifecycle** | Persists forever, across all threads | Lives with the conversation | | **Updated by** | Agent (explicit writes) | Background Observer/Reflector (automatic) | | **Example** | "User prefers Slack, uses cred-1" | "Built wf-123, exec failed, fixed HTTP auth" | ## Configuration | Variable | Type | Default | Description | |----------|------|---------|-------------| | `N8N_INSTANCE_AI_LAST_MESSAGES` | number | 20 | Recent message window | | `N8N_INSTANCE_AI_EMBEDDER_MODEL` | string | `''` | Embedder model (empty = disabled) | | `N8N_INSTANCE_AI_SEMANTIC_RECALL_TOP_K` | number | 5 | Number of semantic matches | | `N8N_INSTANCE_AI_OBSERVER_MODEL` | string | `google/gemini-2.5-flash` | LLM for Observer/Reflector | | `N8N_INSTANCE_AI_OBSERVER_MESSAGE_TOKENS` | number | 30000 | Observer trigger threshold | | `N8N_INSTANCE_AI_REFLECTOR_OBSERVATION_TOKENS` | number | 40000 | Reflector trigger threshold |