project-nomad/admin
Chris Sherwood ffa70a54bc feat(chat): confirm-on-switch + one-chat-model-at-a-time enforcement
Surfaces NOMAD's previously-silent model-stacking behavior and enforces a
"one chat model in VRAM at a time" invariant (the embedding model is
always exempt). Addresses Chris's NOMAD3 testing observation that
switching the dropdown in the chat header was invisibly slow on low-VRAM
hardware because the prior model was never unloaded — Ollama would
either evict it under memory pressure or load the new one on CPU after
the runner choked.

Three integration points all funnel through one new helper:

- **User changes the model dropdown** in an active chat session →
  confirm modal "Switch to {newModel}? Switching to {newModel} will
  start a new chat. Your current conversation stays available in the
  sidebar." On confirm, fire `keep_alive: 0` against the previous chat
  model, clear active session, set the new selection. Cancel snaps the
  visible dropdown back to the previous value (no popup state leaks
  into `selectedModel`).

- **User clicks a session in the sidebar** → no popup (system-initiated).
  Restore the session's stored model into the dropdown and fire
  `unloadChatModels(targetModel)` so anything that isn't the target
  gets the unload hint.

- **Chat page first mount** → page-load normalization. Anything stacked
  from a prior session gets the unload hint with the current selected
  model as the target-to-preserve. Guarded by a ref so it only fires
  once per page lifetime; gated on `selectedModel` being populated.

Backend surface is a single new helper and a single new route:

  `OllamaService.unloadAllChatModelsExcept(targetModel: string | null)`
  → queries `/api/ps`, filters out (a) the embedding model name
  (hardcoded `nomic-embed-text:v1.5` to avoid the RagService circular
  import) and (b) `targetModel`, fires `POST /api/generate` with empty
  prompt + `keep_alive: 0` in parallel against everything else.
  Returns the names that were hinted. Best-effort: network or Ollama
  errors are logged and swallowed so callers don't fail on housekeeping.

  `POST /api/ollama/unload-chat-models` → thin wrapper validating
  `{ targetModel?: string | null }`.

Why `keep_alive: 0` is safe against in-flight inference: per Ollama's
scheduler semantics, the hint sets the post-completion eviction timer
to zero — the runner is not terminated. If Session A is mid-response
on gemma when Session B fires the unload, gemma stays resident until
A's request completes, then evicts. The user-visible worst case is the
race where A's longer-running request re-extends the timer back to the
default and the unload is no-op'd; the next transition (or page reload)
gets another chance, and Ollama's own LRU catches up under memory
pressure regardless. Robust in-flight tracking deferred to a follow-up
if we see stale-state in the wild.

Base `rc`: v1.40.0 will inherit everything from rc.6 via the backmerge.
Frontend tests deferred to a follow-up PR; existing inertia tsconfig
errors are pre-existing and unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 10:16:00 -07:00
..
app feat(chat): confirm-on-switch + one-chat-model-at-a-time enforcement 2026-05-20 10:16:00 -07:00
bin feat: curated content system overhaul 2026-02-11 15:44:46 -08:00
commands feat(Maps): regional map downloads via go-pmtiles extract (#780) 2026-05-20 10:16:00 -07:00
config fix: cache docker list requests, aiAssistantName fetching, and ensure inertia used properly 2026-04-03 14:26:50 -07:00
constants feat(KB): Always/Manual ingest policy toggle (RFC #883 §1/§4) (#894) 2026-05-20 10:16:00 -07:00
database fix(KB): align chunks_per_mb column type with TS contract 2026-05-20 10:16:00 -07:00
docs docs: update release notes 2026-05-20 10:16:00 -07:00
inertia feat(chat): confirm-on-switch + one-chat-model-at-a-time enforcement 2026-05-20 10:16:00 -07:00
providers feat(GPU): auto-remediate nomad_ollama passthrough loss on admin boot (#755) 2026-05-20 10:16:00 -07:00
public feat: switch all PNG images to WEBP (#575) 2026-04-03 14:26:50 -07:00
resources feat(Maps): regional map downloads via go-pmtiles extract (#780) 2026-05-20 10:16:00 -07:00
start feat(chat): confirm-on-switch + one-chat-model-at-a-time enforcement 2026-05-20 10:16:00 -07:00
tests feat(KB): per-file ingest action + state indicator on Stored Files (RFC #883 §5) 2026-05-20 10:16:00 -07:00
types feat(KB): per-file ingest action + state indicator on Stored Files (RFC #883 §5) 2026-05-20 10:16:00 -07:00
util feat: display model download progress 2026-02-06 16:22:23 -08:00
views feat: initial commit 2025-06-29 15:51:08 -07:00
.editorconfig feat: initial commit 2025-06-29 15:51:08 -07:00
.env.example feat: Add Windows Docker Desktop support for local development 2026-01-19 10:29:24 -08:00
ace.js feat: initial commit 2025-06-29 15:51:08 -07:00
adonisrc.ts feat(GPU): auto-remediate nomad_ollama passthrough loss on admin boot (#755) 2026-05-20 10:16:00 -07:00
eslint.config.js feat: openwebui+ollama and zim management 2025-07-09 09:08:21 -07:00
package-lock.json build(deps): bump picomatch in /admin 2026-05-20 10:16:00 -07:00
package.json chore(deps): pin all deps to exact versions 2026-05-20 10:16:00 -07:00
tailwind.config.ts feat: initial commit 2025-06-29 15:51:08 -07:00
tsconfig.json feat: initial commit 2025-06-29 15:51:08 -07:00
vite.config.ts fix(Maps): ensure proper parsing of hostnames (#640) 2026-04-03 14:26:50 -07:00