project-nomad

mirror of https://github.com/Crosstalk-Solutions/project-nomad.git synced 2026-06-01 00:56:50 +02:00

Author	SHA1	Message	Date
jakeaturner	989a401f28	fix(AI): improve remote Ollama url validation to prevent SSRF vulnerability	2026-05-20 10:16:00 -07:00
Chris Sherwood	ffa70a54bc	feat(chat): confirm-on-switch + one-chat-model-at-a-time enforcement Surfaces NOMAD's previously-silent model-stacking behavior and enforces a "one chat model in VRAM at a time" invariant (the embedding model is always exempt). Addresses Chris's NOMAD3 testing observation that switching the dropdown in the chat header was invisibly slow on low-VRAM hardware because the prior model was never unloaded — Ollama would either evict it under memory pressure or load the new one on CPU after the runner choked. Three integration points all funnel through one new helper: - User changes the model dropdown in an active chat session → confirm modal "Switch to {newModel}? Switching to {newModel} will start a new chat. Your current conversation stays available in the sidebar." On confirm, fire `keep_alive: 0` against the previous chat model, clear active session, set the new selection. Cancel snaps the visible dropdown back to the previous value (no popup state leaks into `selectedModel`). - User clicks a session in the sidebar → no popup (system-initiated). Restore the session's stored model into the dropdown and fire `unloadChatModels(targetModel)` so anything that isn't the target gets the unload hint. - Chat page first mount → page-load normalization. Anything stacked from a prior session gets the unload hint with the current selected model as the target-to-preserve. Guarded by a ref so it only fires once per page lifetime; gated on `selectedModel` being populated. Backend surface is a single new helper and a single new route: `OllamaService.unloadAllChatModelsExcept(targetModel: string \| null)` → queries `/api/ps`, filters out (a) the embedding model name (hardcoded `nomic-embed-text:v1.5` to avoid the RagService circular import) and (b) `targetModel`, fires `POST /api/generate` with empty prompt + `keep_alive: 0` in parallel against everything else. Returns the names that were hinted. Best-effort: network or Ollama errors are logged and swallowed so callers don't fail on housekeeping. `POST /api/ollama/unload-chat-models` → thin wrapper validating `{ targetModel?: string \| null }`. Why `keep_alive: 0` is safe against in-flight inference: per Ollama's scheduler semantics, the hint sets the post-completion eviction timer to zero — the runner is not terminated. If Session A is mid-response on gemma when Session B fires the unload, gemma stays resident until A's request completes, then evicts. The user-visible worst case is the race where A's longer-running request re-extends the timer back to the default and the unload is no-op'd; the next transition (or page reload) gets another chance, and Ollama's own LRU catches up under memory pressure regardless. Robust in-flight tracking deferred to a follow-up if we see stale-state in the wild. Base `rc`: v1.40.0 will inherit everything from rc.6 via the backmerge. Frontend tests deferred to a follow-up PR; existing inertia tsconfig errors are pre-existing and unrelated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:16:00 -07:00
Chris Sherwood	43645e4bbc	fix(AI): rewrite RAG query on first follow-up (off-by-one in skip-rewrite threshold) The short-conversation skip in `rewriteQueryWithContext` used `userMessages.length <= 2`, which short-circuits both the very first turn AND the first follow-up. The follow-up is the moment the rewriter matters most — it's where pronouns and shorthand ("the bars", "how long does it last?") need to be resolved against earlier turns before the embedding search runs. With the rewriter skipped, RAG queries against the raw last message, scores nothing above the 0.3 threshold, and no context gets injected for that turn. The visible symptom is the assistant treating the first follow-up in any chat as a brand-new question — e.g. "great - they threw up 2 of the bars it looks like" answered as if it were a recipe-bars question, with no carry-forward of the prior chocolate- poisoning context. Threshold lowered to `< 2`: skip only when there's exactly one user message (nothing to rewrite from). From the first follow-up onward the rewriter runs, as originally intended before commit `96e5027`. Validated against `mistral-nemo:12b` on NOMAD3 by hot-patching the compiled controller and replaying the dog-chocolate scenario. Post-patch response correctly threads "3 Hershey's bars" from turn 1 into turn 2's answer; pre-patch (per reporter's screenshot) pivoted to peanut butter bar recipes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 10:16:00 -07:00
chriscrosstalk	6646b3480b	fix(AI): stop local nomad_ollama container when remote Ollama is configured (#744 ) When users set a remote Ollama URL via AI Settings, the local nomad_ollama container continued running and competed with the remote host for port 11434 and GPU access. Now configureRemote stops the local container on set and restores it on clear (if still present). Container and its models volume are preserved so the local install can be re-enabled later. Closes #662 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 14:26:28 -07:00
Henry Estela	6510f42184	fix(AI): qwen2.5 loading on every chat message (#649 ) Use the currently loaded model for chat title generation and query rewrite.	2026-04-21 14:26:28 -07:00
Henry Estela	69c15b8b1e	feat(AI): enable remote AI chat host	2026-04-03 14:26:50 -07:00
Jake Turner	96e5027055	feat(AI Assistant): performance improvements and smarter RAG context usage	2026-03-11 14:08:09 -07:00
Jake Turner	460756f581	feat(AI Assistant): improved state management and performance	2026-03-11 14:08:09 -07:00
Jake Turner	db69428193	fix(AI): allow force refresh of models list	2026-03-11 14:08:09 -07:00
Jake Turner	00bd864831	fix(AI): improved perf via rewrite and streaming logic	2026-03-03 20:51:38 -08:00
Jake Turner	6874a2824f	feat(Models): paginate available models endpoint	2026-03-03 20:51:38 -08:00
Jake Turner	98b65c421c	feat(AI): thinking and response streaming	2026-02-18 21:22:53 -08:00
Jake Turner	4747863702	feat(AI Assistant): allow manual scan and resync KB	2026-02-09 15:16:18 -08:00
Jake Turner	276bdcd0b2	feat(AI Assistant): query rewriting for enhanced context retrieval	2026-02-08 16:19:27 -08:00
Jake Turner	d4cbc0c2d5	feat(AI): add fuzzy search to models list	2026-02-04 16:45:12 -08:00
Jake Turner	d1f40663d3	feat(RAG): initial beta with preprocessing, embedding, semantic retrieval, and ctx passage	2026-02-01 23:59:21 +00:00
Jake Turner	243f749090	feat: [wip] native AI chat interface	2026-01-31 20:39:49 -08:00

17 Commits