project-nomad

mirror of https://github.com/Crosstalk-Solutions/project-nomad.git synced 2026-05-23 04:45:06 +02:00

History

Chris Sherwood ffa70a54bc feat(chat): confirm-on-switch + one-chat-model-at-a-time enforcement Surfaces NOMAD's previously-silent model-stacking behavior and enforces a "one chat model in VRAM at a time" invariant (the embedding model is always exempt). Addresses Chris's NOMAD3 testing observation that switching the dropdown in the chat header was invisibly slow on low-VRAM hardware because the prior model was never unloaded — Ollama would either evict it under memory pressure or load the new one on CPU after the runner choked. Three integration points all funnel through one new helper: - User changes the model dropdown in an active chat session → confirm modal "Switch to {newModel}? Switching to {newModel} will start a new chat. Your current conversation stays available in the sidebar." On confirm, fire `keep_alive: 0` against the previous chat model, clear active session, set the new selection. Cancel snaps the visible dropdown back to the previous value (no popup state leaks into `selectedModel`). - User clicks a session in the sidebar → no popup (system-initiated). Restore the session's stored model into the dropdown and fire `unloadChatModels(targetModel)` so anything that isn't the target gets the unload hint. - Chat page first mount → page-load normalization. Anything stacked from a prior session gets the unload hint with the current selected model as the target-to-preserve. Guarded by a ref so it only fires once per page lifetime; gated on `selectedModel` being populated. Backend surface is a single new helper and a single new route: `OllamaService.unloadAllChatModelsExcept(targetModel: string \| null)` → queries `/api/ps`, filters out (a) the embedding model name (hardcoded `nomic-embed-text:v1.5` to avoid the RagService circular import) and (b) `targetModel`, fires `POST /api/generate` with empty prompt + `keep_alive: 0` in parallel against everything else. Returns the names that were hinted. Best-effort: network or Ollama errors are logged and swallowed so callers don't fail on housekeeping. `POST /api/ollama/unload-chat-models` → thin wrapper validating `{ targetModel?: string \| null }`. Why `keep_alive: 0` is safe against in-flight inference: per Ollama's scheduler semantics, the hint sets the post-completion eviction timer to zero — the runner is not terminated. If Session A is mid-response on gemma when Session B fires the unload, gemma stays resident until A's request completes, then evicts. The user-visible worst case is the race where A's longer-running request re-extends the timer back to the default and the unload is no-op'd; the next transition (or page reload) gets another chance, and Ollama's own LRU catches up under memory pressure regardless. Robust in-flight tracking deferred to a follow-up if we see stale-state in the wild. Base `rc`: v1.40.0 will inherit everything from rc.6 via the backmerge. Frontend tests deferred to a follow-up PR; existing inertia tsconfig errors are pre-existing and unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-20 10:16:00 -07:00
..
env.ts	feat: gzip compression by default for all registered routes	2026-04-03 14:26:50 -07:00
kernel.ts	feat: gzip compression by default for all registered routes	2026-04-03 14:26:50 -07:00
routes.ts	feat(chat): confirm-on-switch + one-chat-model-at-a-time enforcement	2026-05-20 10:16:00 -07:00