The map_markers backend has accepted a `notes` column since PR #770 and
the popup display path was wired up to render it (commit 6328256), but
the placement UI never got an input. Result: notes are stored,
displayed when present, and impossible to actually enter via the UI.
Add a notes textarea below the name input in the placement popup,
thread the value through `addMarker` and `createMapMarker`, and trim +
null-coalesce on save. Notes display in the marker popup on click is
unchanged and now actually reachable.
- admin/inertia/lib/api.ts: extend createMapMarker request type with
optional notes
- admin/inertia/hooks/useMapMarkers.ts: addMarker accepts and forwards
notes (response already populated notes into local state, so no
display-side change needed)
- admin/inertia/components/maps/MapComponent.tsx: markerNotes state,
textarea after name input, threaded into handleSaveMarker
Edit-mode for existing markers (so users can backfill notes on
already-placed pins) is intentionally out of scope here - selected-marker
popup is still read-only. That's a follow-up PR if there's demand.
Replace literal string matching with ipaddr.js parsing
so equivalent encodings of 169.254.169.254
(::ffff:169.254.169.254, ::ffff:a9fe:a9fe,fully-expanded forms)
and fd00:ec2::254 are all rejected.
foreignKey/localKey were swapped on the ChatMessage → ChatSession
relation. Per Lucid's belongsTo contract, foreignKey is the column
on the child model and localKey is on the parent — so this must be
{ foreignKey: 'session_id', localKey: 'id' }, mirroring the inverse
hasMany on ChatSession.
The relation is not currently preloaded anywhere in the codebase, so
no runtime behavior changes today; this closes a latent bug that
would have broken any future preload('session') call.
RunDownloadJob's onComplete handler was unconditionally firing
EmbedFileJob.dispatch after every ZIM download, gated only by "is
Ollama installed?". The rag.defaultIngestPolicy KV setting was never
consulted, so users who explicitly set Auto-index to Manual still got
every newly-downloaded ZIM auto-embedded.
RagService.scanAndSync already handles Manual correctly by recording
pending_decision rows instead of dispatching (rag_service.ts:1587-1638
via decideScanAction). The post-download path skipped that gate.
Mirror the same check at the dispatch site: read the policy KV; if
Manual, firstOrCreate a pending_decision row in kb_ingest_state so the
per-file Index affordance from PR #909 surfaces the file the same way
scan-time-discovered Manual files do. firstOrCreate (not create) so a
re-download doesn't demote an existing indexed/failed row — the user
can explicitly re-index from the KB panel if they want fresh content.
Verified on NOMAD3: with rag.defaultIngestPolicy='Manual', every ZIM
downloaded today via Content Explorer (agriculture-essential +
computing-essential, ~62 MB across 7 files) wrote kb_ingest_state
rows with state='indexed' instead of pending_decision. Real bug,
not a hot-patch artifact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Since PR #36b6d8e moved tier-installation tracking from a client-side
persistence model to a server-side derive-from-disk model, the card
display only ever updates once every file in a tier is fully on disk.
A user who picks Standard sees a blank card for the duration of the
download (often hours for large tiers like Wikibooks). Worse, if some
files finish before others, the card briefly shows a lower tier (e.g.
Essential) before promoting to the selected tier on completion, which
reads as "the system didn't accept my pick."
Backend: compute a sibling `downloadingTierSlug` by unioning installed
resource IDs with the IDs from active RunDownloadJob queue entries
(waiting + active + delayed, failed deliberately excluded), then
resolving the highest tier whose every resource is in that union. Set
only when it differs from `installedTierSlug` — no point reporting
"downloading Standard" when Standard is already fully installed.
Frontend: unify the prominent corner badge logic in CategoryCard to a
single `badgeTier` derived from selectedTier > downloadingTier >
installedTier. Spinner + "(downloading)" suffix when in flight,
checkmark for installed/selected. The pill row and lime border follow
the same source.
Verified on NOMAD3: backend correctly resolves the downloading tier
from in-flight BullMQ jobs; CategoryCard shows the spinner badge
immediately on Submit and switches to the checkmark variant when
downloads complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes surfaced by armandoescalante in #915 when clicking a
Content Explorer category card (e.g. Medicine) on v1.32.0-rc.6:
1. TierSelectionModal placed a useMemo for freeBytes *after* the
`if (!category) return null` early return (introduced in PR #901's
guardrail integration). When `category` transitioned from null to
non-null on first open, React saw a different hook count between
renders and crashed the entire component tree with "Rendered more
hooks than during the previous render", blanking the modal. Moved
the freeBytes useMemo above the early return so hook order is
constant.
2. `IconLibrary` was used as the icon prop on the Manage Custom
Libraries button in remote-explorer.tsx but never registered in
the DynamicIcon allowlist at admin/inertia/lib/icons.ts. Added it
to both the import block and the icons map so the warning stops
firing and the icon renders.
Closes#915.
Lift the hardcoded 'nomic-embed-text:v1.5' string out of both
RagService and OllamaService into a shared EMBEDDING_MODEL_NAME
constant in constants/ollama.ts. The duplicate in OllamaService
existed only to dodge a circular import with RagService; the
constants module has no service imports, so a shared constant
eliminates both the duplication and the drift risk called out
in the inline "keep in sync" comment.
Surfaces NOMAD's previously-silent model-stacking behavior and enforces a
"one chat model in VRAM at a time" invariant (the embedding model is
always exempt). Addresses Chris's NOMAD3 testing observation that
switching the dropdown in the chat header was invisibly slow on low-VRAM
hardware because the prior model was never unloaded — Ollama would
either evict it under memory pressure or load the new one on CPU after
the runner choked.
Three integration points all funnel through one new helper:
- **User changes the model dropdown** in an active chat session →
confirm modal "Switch to {newModel}? Switching to {newModel} will
start a new chat. Your current conversation stays available in the
sidebar." On confirm, fire `keep_alive: 0` against the previous chat
model, clear active session, set the new selection. Cancel snaps the
visible dropdown back to the previous value (no popup state leaks
into `selectedModel`).
- **User clicks a session in the sidebar** → no popup (system-initiated).
Restore the session's stored model into the dropdown and fire
`unloadChatModels(targetModel)` so anything that isn't the target
gets the unload hint.
- **Chat page first mount** → page-load normalization. Anything stacked
from a prior session gets the unload hint with the current selected
model as the target-to-preserve. Guarded by a ref so it only fires
once per page lifetime; gated on `selectedModel` being populated.
Backend surface is a single new helper and a single new route:
`OllamaService.unloadAllChatModelsExcept(targetModel: string | null)`
→ queries `/api/ps`, filters out (a) the embedding model name
(hardcoded `nomic-embed-text:v1.5` to avoid the RagService circular
import) and (b) `targetModel`, fires `POST /api/generate` with empty
prompt + `keep_alive: 0` in parallel against everything else.
Returns the names that were hinted. Best-effort: network or Ollama
errors are logged and swallowed so callers don't fail on housekeeping.
`POST /api/ollama/unload-chat-models` → thin wrapper validating
`{ targetModel?: string | null }`.
Why `keep_alive: 0` is safe against in-flight inference: per Ollama's
scheduler semantics, the hint sets the post-completion eviction timer
to zero — the runner is not terminated. If Session A is mid-response
on gemma when Session B fires the unload, gemma stays resident until
A's request completes, then evicts. The user-visible worst case is the
race where A's longer-running request re-extends the timer back to the
default and the unload is no-op'd; the next transition (or page reload)
gets another chance, and Ollama's own LRU catches up under memory
pressure regardless. Robust in-flight tracking deferred to a follow-up
if we see stale-state in the wild.
Base `rc`: v1.40.0 will inherit everything from rc.6 via the backmerge.
Frontend tests deferred to a follow-up PR; existing inertia tsconfig
errors are pre-existing and unrelated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Return a discriminated `EmbedSingleFileResult` from `RagService.embedSingleFile`
with `code: 'not_found' | 'inflight' | 'delete_failed' | 'dispatch_failed'` on
failure. `RagController.embedFile` now maps those codes to the correct status
instead of collapsing every failure to 409:
- not_found → 404
- inflight → 409
- delete_failed → 500
- dispatch_failed → 500
The `code` is also included in the JSON body so clients can branch without
string-matching `error`.
Closes the Manual-mode UX dead-end: after toggling 'Auto-index new content
for AI?' to Manual, a freshly-downloaded ZIM (or any pending_decision file)
had no UI path to opt in for embedding short of the global Sync Storage /
Re-embed All bulk actions. Per RFC #883 §5, each Stored Files row now
carries a state pill and an adaptive single-button action.
State pill (left of any existing warning chips):
- 'Indexed' — green; row had chunks in Qdrant or state row is 'indexed'
- 'Not Indexed' — neutral; state is pending_decision or browse_only
- 'Failed' — red
- 'Stalled' — amber
- admin_docs collapsed row has no pill ('Managed by NOMAD' carries it)
Adaptive action button (paired with the existing Delete button per row):
- pending_decision → 'Index' (force=false)
- browse_only → 'Index' (force=true)
- failed / stalled → 'Retry' (force=true)
- indexed + warning chip → 'Re-embed' (force=true; confirm modal first)
- indexed healthy / null → no action button (bulk Re-embed All covers it)
Backend: GET /api/rag/files now returns
{ files: Array<{ source, state, chunksEmbedded }> }
instead of a flat string[]. State + chunk-count come from a single
KbIngestState query unioned into the existing Qdrant-derived source list
(no new round trips). New POST /api/rag/files/embed validates the source is
known, refuses if any inflight job already targets the same filePath
(prevents double-click duplicate-chunk hazard), pre-deletes Qdrant points
when force=true, then dispatches via the existing _dispatchEmbedJobsFor
helper used by reembedAll.
Per-file Re-embed (force=true on an already-indexed file) routes through a
StyledModal confirmation since it deletes existing vectors before queueing
a fresh job — same destructive-action weight as Delete's inline confirm but
heavier since it affects search until the rebuild finishes.
Folds in PR #907's blank-screen fix because my new render needs the same
generic restored: `<StyledTable<KbFileGroup>>` and `record.displayName`
(instead of the unresolved `sourceToDisplayName(record.source)` that ships
in rc.5 and ReferenceErrors on modal open). PR #907 also adds title
tooltips on the three bulk-action buttons; those tooltips are NOT included
here — let PR #907 land first or independently for that part.
Multi-select bulk-opt-in deferred per discussion: most Manual-mode users
ingest 1-2 files at a time, the existing global toggle covers the bulk
case, and checkboxes would expand scope past what rc.6 should hold. Will
file a follow-up issue for an 'Index N pending files' single-click button
once this lands.
Tests-in-PR scope was limited to keeping `kb_file_grouping.spec.ts` green
after the StoredFileInfo[] signature change (added asInfos() wrapper).
Dedicated unit tests for embedSingleFile (unknown source / inflight refused
/ force=true delete-then-dispatch) and the new state-pill rendering will
land in a follow-up PR alongside Playwright coverage of the row actions.
Verification path: NOMAD3 currently runs project-nomad-admin:integration-
rc6-preview (PRs #907 + #908 atop rc.5). After this branch is built into a
new integration tag, I'll re-run targeted Playwright UAT on the KB modal
covering: state pill rendering per state, Index click on pending_decision
opts in cleanly, Retry on failed re-dispatches successfully, Re-embed
confirmation modal copy + delete-then-dispatch on the military-medicine
partial-stall row, and Delete flow untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Easy Setup wizard previously bundled AI model selection + the new
ingest-policy radio into Step 3 alongside Wikipedia/ZIM tiers and curated
content. Three problems with that:
1. Predicate divergence: "is AI selected?" was answered three different
ways across Step 3 radio, Step 4 review card, and handleFinish
persistence. Surfaced in @jakeaturner's review of PR #900. The three
predicates disagree in real cases (e.g. Ollama already installed but
user didn't re-select any models -- handleFinish writes the ingest KV
while the review hides the AI summary).
2. Step 3 was overloaded -- ZIM tiers + curated content + AI models +
ingest policy in one screen.
3. No way to opt out of seeing the AI policy radio when AI isn't part of
the user's setup.
This restructure makes step 4 a dedicated, conditional AI step:
Step 1 (Apps) -- unchanged (services + remote Ollama toggle/URL)
Step 2 (Maps) -- unchanged
Step 3 (Content) -- Wikipedia + curated tiers only
Step 4 (AI) -- NEW, conditional: model picker (or remote notice)
+ auto-index policy radio. Skipped entirely when
AI isn't in the setup.
Step 5 (Review) -- summary, reads back step 4's output via the same
canonical predicate
Decisions per issue #905 discussion:
- Canonical predicate `isAiInSetup` as a useMemo. Single source consumed
by step indicator, nav skip logic, review summary, and handleFinish.
Both prior divergence cases collapse.
- Step indicator renders dynamically: 4 dots when AI is off (positional
display numbers 1..4), 5 dots when AI is on. WizardStep semantic values
(1=Apps, 2=Maps, 3=Content, 4=AI, 5=Review) stay stable so nav handlers
don't have to translate; the dot's `displayNumber` is decoupled from
its `step` so users see sequential 1..N with no gap.
- handleNext / handleBack are symmetric: 3 -> 5 forward, 5 -> 3 back,
when !isAiInSetup. Same predicate gate.
- Toggling AI capability off in Step 1 after AI step selections were
made fires a confirm dialog ("Turning off AI will discard your AI
model picks, indexing policy, and remote Ollama configuration") and
clears selectedAiModels / ingestPolicy / remoteOllamaEnabled on
confirm. Silent clear when nothing was set.
- Remote Ollama toggle stays in Step 1 alongside the capability card.
Don't fragment "am I using remote AI?" across two steps.
The bundled review summary (renderStep5, was renderStep4) now uses
`isAiInSetup` for the auto-index card visibility instead of the
divergent `(selectedAiModels.length > 0 || remoteOllamaEnabled)`.
Inertia tsconfig clean for this file (the only outstanding errors are
the 3 KnowledgeBaseModal ones from issue tracked in PR #907 and the
~64 pre-existing errors elsewhere).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Stored Knowledge Base Files render crashed on first open in v1.32.0-rc.5
with `ReferenceError: sourceToDisplayName is not defined`. The table column's
render() called `sourceToDisplayName(record.source)` but the function was
extracted to `lib/kb_file_grouping.ts` in PR #892 and never imported in
KnowledgeBaseModal.tsx. The unhandled error unmounts the entire React tree,
so users see a blank screen ~20s after opening the panel.
Root cause: PR #895 (conditional warnings) rewrote the render() and used
`sourceToDisplayName(record.source)` instead of `record.displayName`, which
KbFileGroup already carries from groupAndSortKbFiles(). PR #895's review
follow-up (cbae48a) compounded this by narrowing the StyledTable generic
from `KbFileGroup` to `{source: string}`, hiding the type drift from tsc.
This restores the post-#892 pattern:
- StyledTable generic back to `KbFileGroup`
- Render uses `record.displayName` (works for both per-file rows and the
collapsed admin-docs row; calling sourceToDisplayName on the synthetic
`__admin_docs_group__` would have rendered that literal as the row name).
Also folds in tooltip copy on the three bulk-action buttons (Reset & Rebuild,
Re-embed All, Sync Storage) so the difference in destructiveness is visible
on hover. Uses native `title` attribute via StyledButton's prop pass-through;
no new component dependency.
Inertia tsconfig catches this regression cleanly (TS2304 + TS2339); the
pre-push hook only runs the backend tsconfig which excludes inertia/**, so
the bug shipped. Tracking the typecheck-coverage gap as a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Disable TierSelectionModal Submit while the embed-estimate query is in
flight, so a fast click can't slip past the guardrail with an undefined
estimate.
- Move KbGuardrailModal out of the outer <Transition> and render it as a
Fragment sibling — Headless UI's Transition expects Transition.Child
descendants, not raw conditional siblings.
One-time confirmation step gating bulk indexing actions that would
consume a substantial amount of disk for embedding storage. Fires only
when the user has policy=Always (i.e., the system would auto-index)
AND the estimate trips either:
- GUARDRAIL_ABSOLUTE_BYTES = 50 GB embedding cost, OR
- GUARDRAIL_FREE_DISK_RATIO = 10% of current free disk space
Under policy=Manual the guardrail is silent because the user has
already opted out of automatic ingestion — the files would just queue
as pending_decision either way.
Pieces
- inertia/lib/kb_guardrail.ts: pure decision helper with two constants
and an evaluateGuardrail() that returns a verdict + reasons. No I/O
on the helper itself so the logic is trivially testable
- inertia/components/KbGuardrailModal.tsx: confirmation dialog. Headless
UI Transition + Dialog, amber 'large operation' header, plain-English
estimate summary, [Cancel] / [Proceed anyway] footer. z-[60] so it
layers above the tier modal underneath instead of replacing it
- inertia/components/TierSelectionModal.tsx integration: handleSubmit
now evaluates the guardrail when policy=Always and embedEstimate is
available; if it trips, we stash the verdict in state and render the
guardrail modal as an overlay. Confirm runs finalizeSubmit (which is
the pre-existing onSelectTier + onClose path); Cancel just closes the
guardrail and leaves the tier modal as-is so the user can change
their tier choice or flip the policy
The disk-free signal comes from the existing useSystemInfo hook +
getPrimaryDiskInfo helper. Passing freeBytes=0 (unknown) skips the
relative-disk check, so the modal still works on hosts whose disk
introspection failed — just relies on the absolute 50 GB threshold
Tests
- 9 cases in tests/unit/kb_guardrail.spec.ts: standard small batch (no
trip), exact absolute threshold trips, over-absolute trips, over 10%
free trips, both-at-once trips with two reasons, freeBytes=0 skip,
freeBytes=0 + over-absolute trip, exact-10% boundary trips, just-
under-both safe. All green.
Stacks on feat/kb-tier-estimate-on-disk (#897) — consumes that PR's
estimate endpoint to compute the verdict input. Auto-rebases to rc
when #897 merges.
Pairs with #894 (policy toggle) and #899 (JIT prompt): together the
three PRs cover the 'how do I avoid surprising the user with auto-
indexing they didn't ask for?' arc.
Out of scope (deferred)
- 6 hr time threshold (RFC §7): needs a per-host chunks-per-second
metric we don't capture yet; would be a follow-up after Phase 4
self-calibration (RFC §15) lands
- Wider integration (KbPolicyPromptBanner 'Index now' button, manual
KB-modal sync): TierSelectionModal is the dominant bulk-decision
surface and the right place to land this first
Adds an inline auto-index policy choice inside the Easy Setup wizard's
existing AI section (Step 3 'Content', alongside AI model selection).
The selection is persisted to KVStore['rag.defaultIngestPolicy'] on
wizard submit — same key #894's KB modal toggle reads/writes — so a
user who completes the wizard never sees the first-chat JIT prompt
(#899); their decision is already recorded.
Default is 'Always' so new users who keep the default get the 'just
works' experience: content downloaded by the wizard becomes searchable
as soon as it finishes embedding, without a follow-up step. Users who
prefer the explicit-opt-in flow can flip to 'Manual' before submitting.
Skipped when the user doesn't select the AI capability — the KV stays
null and the JIT prompt handles the decision later if/when they enable
AI from settings.
UI placement
- Step 3 'Content': new section below AI Models grid (only when AI is
selected), two-button radio matching #894's KB-modal toggle pattern
for visual consistency
- Step 4 'Review': new 'Auto-index Setting' card summarizing the choice
in plain English ('New content will be indexed automatically' vs
'New content will wait for you to opt in') so the user knows what
they're agreeing to before clicking Complete Setup
handleFinish
- New api.updateSetting('rag.defaultIngestPolicy', ingestPolicy) call
runs first, before service installs/downloads, so any content that
finishes embedding during this same wizard run sees the right policy
- Wrapped in its own try/catch so a transient KV write failure doesn't
abort the rest of the wizard
Stacks on feat/kb-policy-toggle (#894) — uses the policy KV mechanism
that PR introduces. Auto-rebases to rc when #894 merges.
Pairs with #899 (JIT prompt): wizard users decide here; non-wizard
users decide at first chat. Together they cover every entry path
to v1.32.0 without double-prompting.
- KbPolicyPromptBanner: add onError toast to maybeLaterMutation so a
failed policy save surfaces to the user instead of looking like a
broken button (banner would otherwise reappear on next chat open
with no explanation).
- KbPolicyPromptBanner: set staleTime: Infinity on the prompt-state
query. For users who already picked a policy (the vast majority),
the result is effectively immutable per session — the mutations
invalidate the key when it actually changes.
When a user opens AI Chat with content available but no global ingest
policy yet recorded, surface a one-time banner above the chat header
asking how they want new content handled:
- 'Index existing content' -> sets rag.defaultIngestPolicy=Always and
triggers a sync so pending_decision files queue immediately
- 'Maybe later' -> sets policy=Manual; existing and future content
waits in pending_decision until the user opts in from the KB modal
After either button is clicked the banner never reappears, because both
write the policy KV (the same one #894 manages via the KB modal toggle).
There is intentionally no 'dismiss without deciding' X — that would just
re-show the banner forever.
Backend
- New GET /api/rag/policy-prompt-state returns
{shouldPrompt, hasContent, totalFiles}
- RagService.getPolicyPromptState() reads KVStore('rag.defaultIngestPolicy')
and counts kb_ingest_state rows; shouldPrompt is true only when policy
is null AND scanner has seen >=1 file (avoids prompting on empty NOMADs)
Frontend
- New KbPolicyPromptBanner component (~120 LOC) handles the two-button
decision flow with optimistic loading state, success/error toasts, and
invalidates kbPolicyPromptState + ingestPolicy + embed-jobs + storedFiles
on success
- Mounted in components/chat/index.tsx as the first child of the main
content column so it sits above the chat title bar without taking space
when shouldPrompt is false (renders nothing)
- Reads aiAssistantName from Inertia page props so banner copy matches
the user's chosen assistant name
Stacks on feat/kb-policy-toggle (#894) because the policy KV mechanism
it writes through is introduced there. Both can land in rc.5; this PR
auto-rebases to rc once #894 merges.
Existing users on first upgrade to v1.32.0 will see this banner on first
chat visit post-upgrade — an explicit opt-in moment for content that was
already on disk. New users see it the first time they have curated
content downloaded.
Closes the 'zero_chunks warning has no row to attach to' gap surfaced by
the 2026-05-14 integration UAT. Before this fix RagService.getStoredFiles
returned only file paths that appeared in Qdrant's payload.source — so
files with 0 embedded chunks (video-only ZIMs, browse_only opt-outs,
ingestions that failed before producing any chunks) silently disappeared
from the KB panel's Stored Files table.
The fix unions the Qdrant scroll result with the disk-backed file paths
recorded in kb_ingest_state. Effect:
- lrnselfreliance_en_all_2025-12.zim (3.97 GB video-only ZIM, 0 chunks)
now appears in the table, picks up its zero_chunks warning chip
- Files in pending_decision under Manual policy show up so the user can
see what's waiting for opt-in
- Files in browse_only / failed states have a row for future per-card
Retry / Re-index actions (forthcoming, blocked on #886)
The state-machine query is wrapped in its own try/catch so a transient
DB error degrades to the Qdrant-only list rather than 500-ing the whole
panel — same defensive posture as the outer try/catch.
Stacks on feat/kb-ingest-state-machine (#888) because the union depends
on the kb_ingest_state table that PR introduces. Will rebase to rc once
#888 merges. Completes the second half of #895's warning surface; the
first half (partial_stall) already worked because those files have at
least some chunks in Qdrant.
useEmbedJobs already polls every 2s while jobs are active (and 30s when
idle) and auto-invalidates Stored Files when the queue drains. The
manual Refresh button was a no-op signal — it just confuses users who
click it and see no change. Per-job 'last activity Xs ago' lines remain
as the live-recency indicator.
Stacks on feat/kb-job-status-pill (#893) since the Refresh button only
exists in that branch.
`computeFileWarnings()` previously caught all errors and returned an empty
map, which the frontend rendered as "every file is healthy" — reintroducing
exactly the silent-failure mode this surface exists to expose.
Return `{ ok, warnings }`; flip `ok: false` from the catch. KB modal renders
an inline amber notice under the Stored Files header when `ok === false`,
leaving per-row warning rendering untouched. Transient failures self-heal on
the next 30s poll; no toast spam.
Surfaces two silent failure modes that the prior binary
"any-chunks-in-Qdrant ⇒ embedded" check could not distinguish from
healthy ingestion:
- **Warning A — Zero-chunk file** (file_size > 100 MB, chunks = 0)
Fires on video-only / image-only ZIMs (`lrnselfreliance_en_all`,
TED talks, etc.) that the pipeline completes "successfully" with no
extractable text. AI Assistant literally cannot reference these.
- **Warning B — Partial-embed stall** (chunks < 50% of expected from
the ratio registry). Surfaces the simple_wiki "266 of 600,000 chunks"
case observed during NOMAD1 ingestion testing — previously these
looked identical to fully-completed embeds in the UI.
Both warnings render only when their condition is met (silent by
default; noisy only on real problems).
Base is `feat/kb-ratio-registry` (#891) because Warning B's "expected
chunks" estimate comes from `KbRatioRegistry.estimateChunks()`. GitHub
fast-forwards to `rc` once #891 merges.
- `app/utils/kb_warning_decision.ts` — pure `decideWarnings(inputs)`
with thresholds (`100 MB`, `0.5×`) as exported constants. 10 unit
tests cover the healthy case, both warnings, the under/at/over
boundary, the registry-miss suppression, and the video-only registry
case (`expectedChunks: 0` correctly skips Warning B).
- `RagService.computeFileWarnings()` — single Qdrant scroll tallies
chunks per source, filesystem walk fills in zero-chunk files,
ratio registry estimates the expectation, decision function emits.
- New endpoint `GET /api/rag/file-warnings` returns
`Record<source, FileWarning[]>` (sources with no warnings are
omitted, so the frontend can `warnings[source] ?? []` for clean
defaults).
- KB modal: warnings render inline under the file name as amber-tinted
pills. Polled every 30s alongside the existing health check.
- Warning C — chunks skipped due to length. PR #890 (#881 fix) prevents
the silent drop at the embed boundary, so the underlying condition
shouldn't fire anymore. If we still want to surface "we truncated
N chunks to fit", that needs separate `skipped_count` tracking in
EmbedFileJob — a Phase 2 follow-up.
- Suppressing Warning B during active mid-ingestion. The user can cross-
reference the Processing Queue to know it's in-flight; suppressing
warnings while a job runs would mask real stalls where the job died
mid-batch. Will revisit when per-card status is wired through.
- Use of `kb_ingest_state.chunks_embedded` (#888) as the chunk count
source. This PR uses Qdrant scroll directly so it can land
independently of #888.
- 10 new unit tests on `decideWarnings`, all pass
- Type-check clean
- Hot-patch + browser smoke test deferred until #891 lands (the ratio
registry needs to exist in the DB for `estimateChunks()` to return
non-null estimates — without it, only Warning A fires which is still
useful but Warning B stays dormant)
When a user picks a tier in TierSelectionModal, show how much additional
disk space the AI Assistant will need if the new ZIMs are indexed, plus
a policy-aware footer explaining whether they'll auto-index (Always) or
wait for opt-in (Manual). Estimates consume #891's KbRatioRegistry via a
new POST /api/rag/estimate-batch endpoint.
Backend
- New POST /api/rag/estimate-batch route + RagController.estimateBatch
- VineJS schema accepting array of {filename, sizeBytes}, capped at 500
- KbRatioRegistry.estimateBatch aggregates via the existing prefix-match
lookup, returns {totalChunks, totalBytes, hasUnknown}
- New BYTES_PER_CHUNK_ON_DISK constant (~8 KB: 3 KB vector + ~3 KB chunk
text + ~2 KB payload/index overhead). Tunable; will be replaced by
Phase 4 self-calibration once we have real measurements.
- Controller normalizes incoming filenames via path.basename so callers
that send full paths or URLs still match registry prefixes correctly.
Frontend
- api.estimateEmbeddingBatch() client method
- TierSelectionModal: when localSelectedSlug is set, resolve the tier's
resources (incl. inherited tiers), POST to /estimate-batch, and render
a new info block with the +~X GB figure + ingest-policy copy. Also
fetches rag.defaultIngestPolicy so the same block surfaces whether
indexing will fire automatically or wait for the user.
- resourceFilename() helper extracts the basename from the resource URL
so the registry lookup hits the right prefix regardless of mirror.
Tests
- 4 new cases in tests/unit/kb_ratio_lookup.spec.ts covering the
estimateBatch aggregator: standard sum, unknown-flagging, video-only
ZIM (0 chunks but known, hasUnknown stays false), empty input.
Stacks on feat/kb-ratio-registry (#891) — consumes the registry table
seeded by that PR. Once #891 merges to rc, this PR auto-rebases.
Out of scope for this PR (deferred to follow-ups):
- Per-batch opt-in checkbox (RFC §1's '☑ Also index these for AI') needs
a per-batch policy override path and is a separate PR
- Guardrail modal at 50 GB / 10% free / 6 hr thresholds (RFC §7) is also
separate; this PR is informational, not gating
- Time-to-embed estimate awaits a chunks-per-second metric per host
* feat(KB): per-file ingest state machine (Phase 1 of RFC #883)
Adds a persistent state machine for AI knowledge-base ingestion so the
scanner can distinguish "fully indexed", "user opted out", "failed", and
"stalled" from each other — none of which were derivable from the prior
binary "any chunks in Qdrant ⇒ embedded" check.
## What lands
- New table `kb_ingest_state` keyed by `file_path` with enum state column
(`pending_decision | indexed | browse_only | failed | stalled`).
Independent of `installed_resources` so it covers both curated downloads
and manually-uploaded KB files.
- New KV key `rag.defaultIngestPolicy` (string: `Always | Manual`).
Registered now but not consumed yet — JIT prompt + wizard step land in
Phase 3 of the RFC.
- `EmbedFileJob.handle` writes state on terminal outcomes:
- Success (final batch) → `indexed` + chunks count
- `UnrecoverableError` → `failed` + error message
- Retryable errors are left to BullMQ's existing retry path
- `scanAndSyncStorage` swaps the binary qdrant check for a state-aware
decision tree (see `decideScanAction`). Existing installs auto-backfill
on first scan: files with chunks in Qdrant but no state row become
`indexed`; new files start as `pending_decision`.
- `deleteFileBySource` drops the state row last, so removed files
disappear entirely instead of leaving an orphan that the next scan
would re-dispatch into nothing.
## What does NOT land here
- Ratio registry (separate PR) — needed for partial-stall detection and
cost estimates, but a separable concern.
- #880 follow-up initial-progress anchor (separate tiny PR).
- Phase 2 UI (status pill, per-card actions, conditional warnings).
- Phase 3 policy surfaces (wizard step, JIT prompt, guardrail modal).
- PR #886's bulk-action hookup — `_deletePointsBySource` / Re-embed All
/ Reset & Rebuild would also want to set state, but #886 isn't merged
yet; that wiring goes in a follow-up once #886 lands.
## Target
This is forward work for v1.40.0 (RFC #883). Branching off `rc` because
that's the current latest base and post-GA Jake will sync rc→dev; a
retarget at PR-open time is a fast-forward if requested.
## Tests
- 9 new unit tests for `decideScanAction` covering all five states plus
the no-row / chunks-present / chunks-missing combinations
- Type-check clean
- Smoke-tested end-to-end on NOMAD3 via hot-patch:
- Backfill: 5 ZIMs + 2 KB uploads with existing chunks in Qdrant all
came back `indexed` on first scan
- Pending dispatch: a video-only ZIM with no chunks (`lrnselfreliance`)
came back `pending_decision` and was correctly re-dispatched (Bull
deduped to its historical `:completed` jobId — bgauger's #886 fix
drains that)
- Delete hook: deleting a KB upload via `DELETE /api/rag/files`
removed both the disk file and the state row
* feat(KB): Always/Manual ingest policy toggle (RFC #883 §1/§4)
Activates the `rag.defaultIngestPolicy` KV registered in Phase 1
(#888) so users on a fresh install (or anyone who picks Manual mode)
no longer get every new ZIM auto-dispatched to the embed pipeline.
## Stacks on #888
This PR's base is `feat/kb-ingest-state-machine` (#888). The state
machine has to be in place for the decision function to be policy-aware;
GitHub will fast-forward the base to `rc` once #888 merges.
## Backend changes
- `decideScanAction` now takes a `policy: 'Always' | 'Manual'` argument
(defaults to `Always` for backward compatibility).
- New `ScanAction` kind: `create_pending`. Manual mode records that the
scanner has seen a new file (so the UI can surface a per-card Index
affordance later) without dispatching an EmbedFileJob.
- `scanAndSyncStorage` reads the KV and passes it through. The scan-result
log line now includes the active policy and a `waiting on user` count
for Manual-mode hits.
- `rag.defaultIngestPolicy` added to `SETTINGS_KEYS` so it's reachable
through the existing `GET/PATCH /api/system/settings` surface — no new
endpoint.
## Frontend changes
- New section in the KB panel between "Why upload" and "Processing Queue":
"Auto-index new content for AI? [Always | Manual]" — segmented radio
with copy explaining the 5-10× disk multiplier. Default Always.
- `useQuery('ingestPolicy')` reads the current value; clicking the
inactive option mutates and shows a notification confirming the new
behavior.
## Tests
- 14 unit tests on `decideScanAction` (was 9) — split into Always-mode
cases (preserves Phase 1's contract) and Manual-mode cases
(`create_pending`, `pending_decision → skip`, etc.).
- Type-check clean.
- Hot-patch + browser verification deferred until #888 lands; the state
machine smoke-tested cleanly on NOMAD3 in #888's PR, and this PR's
decision-tree changes are exhaustively unit-tested.
## RFC open question §3 — policy-change re-trigger
Switching Manual → Always doesn't auto-dispatch existing `pending_decision`
rows immediately. The next scan re-evaluates and dispatches them under
the new policy. This matches the RFC's "treat the switch as I've-
thought-about-it" instinct for the guardrail; full guardrail
implementation lands in Phase 3 task 14.
---------
Co-authored-by: Jake Turner <52841588+jakeaturner@users.noreply.github.com>
Each in-flight (or stuck) embedding job gets a colored health pill,
relative-activity timestamp, and chunk counter so users can tell at a
glance whether ingestion is making progress.
## Health states
- **🟢 Active** — last batch < 2 min ago
- **🟡 Slow** — last batch 2-5 min ago (CPU-paced multi-batch ingestion
lives here naturally; not always a problem)
- **🔴 Stalled** — last batch > 5 min ago (likely real problem)
- **⚪ Waiting** — queued, no batch started yet
- **🔴 Failed** — job recorded failed status
## What lands
- New backend util `kb_job_health.ts` with pure `computeJobHealth(input)`
decision function. Time-based thresholds (2 min / 5 min) inlined as
constants. 9 unit tests pin the boundaries.
- `EmbedJobWithProgress` gains `lastBatchAt`, `startedAt`, `chunks` —
already set by `EmbedFileJob.handle` on every batch transition, just
not previously surfaced through `listActiveJobs`.
- Frontend `kb_job_health_display.ts` maps each status to a Tailwind
dot color, label, and aria-label so backend and UI stay in sync.
- `ActiveEmbedJobs.tsx` renders the pill, "last activity Xs ago", and
chunk counter above each progress bar. Adds a manual Refresh button
and "Last updated Xs ago" line — the existing 2s/30s auto-poll
cadence in `useEmbedJobs` is left intact.
- Live tick at 5s keeps the relative timestamps current without
re-fetching from the API.
## Not in scope
- Per-card Cancel / Retry / Un-index — separate Phase 2 PR
- Conditional warnings A/B/C — separate Phase 2 PR
- Computing throughput rate (chunks/min) — needs ratio registry consumer
(Phase 2 follow-up); for now the pill answers the "is it stuck?"
question directly without a rate estimate.
Project NOMAD's bundled docs (`/app/docs/*.md` and `README.md`) each
embed as their own KB source — currently rendering as 12+ individual
rows that swamp user-uploaded content in the Stored Files table.
Collapse them into one informational row:
> Project NOMAD documentation · 12 files · Managed by NOMAD
The admin-docs row hides the Delete button (those files would be
re-embedded on the next sync anyway, so deleting is a footgun). User
uploads and ZIMs keep their existing per-row Delete UX.
Also adds deterministic sort: ZIMs → user uploads → admin docs → other,
alphabetical within each bucket. Pure frontend change — `/api/rag/files`
response shape unchanged.
Decision logic extracted to `kb_file_grouping.ts` with 9 unit tests
covering bucket classification, sort order, count noun pluralization,
and empty-input handling.
Switch kb_ratio_registry.chunks_per_mb from DECIMAL(10,2) to UNSIGNED
INTEGER so the value mysql2 returns matches the `number` type declared
on the model. DECIMAL columns deserialize as strings by default, which
would break `=== 0` checks for video-only ZIMs and silently coerce
through arithmetic in Phase 2 consumers.
All seeds are whole numbers and the heuristic's real-world variance
(~±50%) makes sub-integer precision meaningless.
Foundation for the cost estimates and partial-stall detection that
Phase 2 will surface. No consumers yet — this PR just lays the table,
the seed rows, and the lookup helper so subsequent UI work has
estimates available without a per-ZIM benchmark.
## What lands
- New table `kb_ratio_registry` (pattern, chunks_per_mb, sample_count,
notes). Migration creates and seeds heuristic defaults from the RFC
appendix: devdocs (1100/MB), Wikipedia variants (270/MB), iFixit
(50/MB), Stack Exchange Q&A (200/MB), video-only ZIMs (0), plus a
catch-all fallback at 100/MB.
- `KbRatioRegistry` model with static `lookup()` and `estimateChunks()`.
- Pure helper `kb_ratio_lookup.ts` doing longest-prefix-match — a
specific entry (`wikipedia_en_simple_`) overrides a broader one
(`wikipedia_en_`). 9 unit tests covering the lookup boundary.
- `sample_count` starts at 0 (heuristic seed) and is reserved for
Phase 4 self-calibration to increment as observed ZIMs update each row.
## Not in scope
- Self-calibration on successful ingestion (Phase 4)
- UI consumers — Warning B (partial-embed stall) and the storage budget
meter / time estimates land in Phase 2.
## Tested
- Type-check clean
- 9 unit tests pass for `findChunksPerMb` and `estimateChunkCount`
- Migration applied on NOMAD3 via hot-patch; 9 seed rows verified in DB
The OpenAI-compatible /v1/embeddings fallback path can't pass
`truncate:true` / `num_ctx:8192` to the model, so any chunk that
exceeds the model's loaded context_length (often 2048 for
nomic-embed-text:v1.5) returns a 400 BadRequestError and is silently
dropped from Qdrant. Two CPU-only ingestion runs on NOMAD1 hit this
on dense technical content (medlineplus, arduino.stackexchange) even
after PR #763's num_ctx fix on the native path.
Pre-cap each input string at 4000 chars before either backend call.
That's ~1000-2000 tokens depending on density, comfortably under the
model's 2048 default. The chunker in RagService is sized for
MAX_SAFE_TOKENS=1600 (3200 chars at its conservative 2 chars/token
estimate), so well-formed inputs are never touched; this is purely a
runtime safety net for the edge cases that slip through.
Also stop swallowing the original error in the catch. The bare
`} catch {}` here has masked recurring "input length exceeds context
length" failures for months (#369, #670, #881). Capture and warn-log
the message so future investigations see why we fell back.
Same root cause as #369 and #670 which were closed without an actual
fix to the fallback path.
Each continuation batch of a multi-batch ZIM embed runs as a fresh
BullMQ job, so handle() ran the hardcoded `safeUpdateProgress(job, 5)`
even when the file was already 100k articles into a 600k-article ZIM.
The UI gauge briefly dropped to 5% before the per-batch onProgress
callback caught up to the true overall percentage, reading as a
backward jump every time a new batch started.
Compute initialPercent from batchOffset / totalArticles when available,
falling back to 5 for single-batch files (uploaded PDFs, txts) where
totalArticles isn't set. Capped at 99 to leave headroom for the 100%
final-batch marker.
Follow-up to PR #880 (which fixed the 0-100% scaling during a batch
but still had the initial-frame regression).
Adds a persistent state machine for AI knowledge-base ingestion so the
scanner can distinguish "fully indexed", "user opted out", "failed", and
"stalled" from each other — none of which were derivable from the prior
binary "any chunks in Qdrant ⇒ embedded" check.
## What lands
- New table `kb_ingest_state` keyed by `file_path` with enum state column
(`pending_decision | indexed | browse_only | failed | stalled`).
Independent of `installed_resources` so it covers both curated downloads
and manually-uploaded KB files.
- New KV key `rag.defaultIngestPolicy` (string: `Always | Manual`).
Registered now but not consumed yet — JIT prompt + wizard step land in
Phase 3 of the RFC.
- `EmbedFileJob.handle` writes state on terminal outcomes:
- Success (final batch) → `indexed` + chunks count
- `UnrecoverableError` → `failed` + error message
- Retryable errors are left to BullMQ's existing retry path
- `scanAndSyncStorage` swaps the binary qdrant check for a state-aware
decision tree (see `decideScanAction`). Existing installs auto-backfill
on first scan: files with chunks in Qdrant but no state row become
`indexed`; new files start as `pending_decision`.
- `deleteFileBySource` drops the state row last, so removed files
disappear entirely instead of leaving an orphan that the next scan
would re-dispatch into nothing.
## What does NOT land here
- Ratio registry (separate PR) — needed for partial-stall detection and
cost estimates, but a separable concern.
- #880 follow-up initial-progress anchor (separate tiny PR).
- Phase 2 UI (status pill, per-card actions, conditional warnings).
- Phase 3 policy surfaces (wizard step, JIT prompt, guardrail modal).
- PR #886's bulk-action hookup — `_deletePointsBySource` / Re-embed All
/ Reset & Rebuild would also want to set state, but #886 isn't merged
yet; that wiring goes in a follow-up once #886 lands.
## Target
This is forward work for v1.40.0 (RFC #883). Branching off `rc` because
that's the current latest base and post-GA Jake will sync rc→dev; a
retarget at PR-open time is a fast-forward if requested.
## Tests
- 9 new unit tests for `decideScanAction` covering all five states plus
the no-row / chunks-present / chunks-missing combinations
- Type-check clean
- Smoke-tested end-to-end on NOMAD3 via hot-patch:
- Backfill: 5 ZIMs + 2 KB uploads with existing chunks in Qdrant all
came back `indexed` on first scan
- Pending dispatch: a video-only ZIM with no chunks (`lrnselfreliance`)
came back `pending_decision` and was correctly re-dispatched (Bull
deduped to its historical `:completed` jobId — bgauger's #886 fix
drains that)
- Delete hook: deleting a KB upload via `DELETE /api/rag/files`
removed both the disk file and the state row
Co-authored-by: Jake Turner <52841588+jakeaturner@users.noreply.github.com>
onWikipediaDownloadComplete was deleting every file whose name starts
with `wikipedia_en_`, treating distinct corpora (simple, medicine,
wikivoyage, climate_change, etc.) as competing versions of the same
selection slot. Whichever wiki finished second silently wiped the
other from disk.
Match by filename stem instead — strip the trailing `_YYYY-MM(-DD).zim`
date suffix and only delete files with the same stem as the new
download. Different release dates of the same variant still get cleaned
up; distinct variants are preserved.
Extracted the predicate to `app/utils/zim_filename.ts` so the boundary
is covered by unit tests (8 cases incl. the #884 repro scenario).
Before this change, the Active Downloads / Processing Queue UI showed the
ingestion progress gauge jumping wildly during multi-batch ZIM ingestion
(e.g. 5% → 88% → 27% → 5% → 56% → 36% over ~60 seconds for cooking SE).
Each continuation batch is a separate BullMQ job, and `EmbedFileJob.handle()`
reported `job.progress` in two different reference frames depending on
where it was in the batch lifecycle:
- During-batch (via the onProgress callback): 5% → 95% scaled across
"% through this batch's chunks"
- End-of-batch (just before dispatching the next): overwritten to
`(nextOffset / totalArticles) * 100` — % through the whole file
- Next continuation batch starts with progress = 5% explicitly, then
climbs through the per-batch range again
`listActiveJobs()` returns the latest active BullMQ job's progress. With
GPU-accelerated ingestion completing a batch every ~4 seconds, the UI
saw the jobId rotate constantly and the gauge whipsaw between the two
reference frames.
`totalArticles` was already wired through the EmbedFileJob params shape
and used end-of-batch — but RagService never actually populated it,
so any frame-scaling that depended on it silently fell back to the
per-batch range. Two fixes together:
1. `ZIMExtractionService.extractZIMContent()` now returns
`{ chunks: ZIMContentChunk[]; totalArticles: number }` instead of a
raw chunks array, surfacing `archive.articleCount` to the caller.
Single caller (rag_service) updated to destructure.
2. `RagService.processZimFile()` includes `totalArticles` in its result
so `EmbedFileJob.dispatch()` can propagate it to the continuation
batch (which the existing code already does via
`totalArticles: totalArticles || result.totalArticles`).
3. `EmbedFileJob`'s onProgress callback scales the service-reported
per-batch percent into the overall-file frame when `totalArticles`
is known: `((batchOffset + (percent/100) * ZIM_BATCH_SIZE) /
totalArticles) * 100`. Capped at 99% to leave room for the explicit
100% set at file completion. Falls back to the original 5-95% range
for single-batch files (uploaded PDFs/txts) where totalArticles is
undefined — the gauge then represents % through the only batch,
which is what the UI expects for one-shot files.
Validated on NOMAD8 (RX 6800, ROCm-accelerated nomic):
- devdocs python (small, ~1500 articles): batch progressions seen
monotonically across continuation jobIds:
1501@30% → 1510@33% → 1514@43% → 1518@52%.
- ifixit (huge, ~100k articles): stays near 3% for the first many
batches at offset 0..3000 — correct, the file is enormous.
- wikipedia_en_medicine (large, ~70k articles): stays near 0-1% for
the first batches — also correct.
- Brief 0-5% blip on continuation handoff (the explicit
`safeUpdateProgress(job, 5)` at batch start, before the first
onProgress callback fires) — visible but quickly resolves to the
overall-frame value. No more 5% ↔ 88% chaos.
Jake noted that `inspect.State.StartedAt` could be missing/malformed,
which would land NaN inside `container.logs({ since, until })`. Add
defensive validation that the parsed timestamp is finite and positive
before using it, with a fallback to the previous tail:500 strategy
(plus a warn log) when it isn't. Happy path is unchanged.
Two related fixes to make the System Information page reliably show real
GPU info instead of misleading lspci BAR0 readings or N/A.
1. Generalize bogus-VRAM detection to AMD.
Same root cause as #835 (NVIDIA showing 32 MB), this time for AMD: lspci
parses the first PCI memory Region (BAR0, typically 1-16 MiB on Navi
cards) as `vram`. On NOMAD8 (Threadripper 3960X + Radeon RX 6800), the
System Information page showed "1 MB" instead of "16 GB". PR #850 fixed
this for NVIDIA by clearing the bogus value and re-running the Ollama
log probe; the check was vendor-gated to NVIDIA only.
`isBogusNvidiaVram` becomes `isBogusDgpuVram` with a `isDiscreteGpuVendor`
helper matching /nvidia|advanced micro devices|amd|ati/i. Same 256-MiB
threshold — no real discrete GPU has less than that, while Intel iGPUs
(which legitimately report small shared-memory VRAM via lspci) are left
untouched. The probe gate condition is similarly renamed.
2. Read Ollama logs from the startup window, not tail:N.
`getOllamaInferenceComputeFromLogs()` was reading the last 500 log lines
and grepping for the "inference compute" line. That line is written once
during Ollama's GPU discovery phase within seconds of startup. Under
active embedding workloads we measured >1000 log lines/min, which pushes
the line past any reasonable tail within minutes — at which point the
probe returns null and the UI flips to "GPU Not Accessible" even though
Ollama is happily using the GPU (size_vram > 0 in /api/ps).
Switch from `tail: 500` to `since: containerStartedAt, until:
containerStartedAt + 300s`. The 5-minute window is bounded regardless of
container uptime and always captures Ollama's GPU discovery output. The
inference-compute line is emitted in the first few seconds of startup, so
5 min is generous headroom.
Validated on NOMAD8 (RX 6800, container uptime ~10 min with sustained
ingestion that generated 6,345 log lines):
Before:
controllers[0]: { model: "Navi 21 ...", vram: 1 }
After (bogus AMD VRAM cleared, log probe stale due to tail:500 churn):
controllers[0]: { model: "Navi 21 ...", vram: null }
gpuHealth: { status: "passthrough_failed" }
-> UI shows "N/A" and the banner from PR #208
After (bogus cleared + log probe reads startup window):
controllers[0]: { model: "AMD Radeon RX 6800", vram: 16384 }
gpuHealth: { status: "ok", hasRocmRuntime: true, ollamaGpuAccessible: true }
-> UI shows "16 GB", no banner
Both branches of the fix exercise correctly: NVIDIA path unchanged
(same code, just renamed identifiers), AMD path now triggers the probe
and the probe reliably finds the GPU info regardless of container age.
After an update, container recreate, or docker daemon restart, nomad_ollama's
HostConfig.DeviceRequests still lists the nvidia driver — but the NVIDIA
Container Toolkit binding inside the container is torn. `nvidia-smi` returns
"Failed to initialize NVML: Unknown Error" and Ollama silently falls back to
CPU inference. PR #208 detects this and shows a banner with a "Fix: Reinstall
AI Assistant" button. This change does that click automatically on admin boot.
New provider GpuPassthroughRemediationProvider runs once on web env boot:
1. Skip when KV `ai.autoFixGpuPassthrough = false` (default true).
2. Skip when Docker has no `nvidia` runtime registered (AMD-only and CPU-only
hosts unaffected).
3. Skip when nomad_ollama isn't running.
4. Exec `nvidia-smi --query-gpu=name --format=csv,noheader` inside the
container with an 8-second timeout. If the output matches
"Failed to initialize NVML", "Unknown Error", "TIMEOUT", or contains no
alphabetic characters, treat the passthrough as broken.
5. On broken: call DockerService.forceReinstall('nomad_ollama'). The existing
force-reinstall preserves the Ollama volume + installed models. Stamp
`gpu.autoRemediatedAt` on success.
6. On healthy: log and exit.
AMD passthrough_failed is intentionally not handled — its fix path is HSA
override handling (PR #804) rather than a simple service recreate, and false
positives during AMD startup log parsing would loop a recreate without fixing
anything. Left to a follow-up if it proves to be a recurring AMD issue.
Validated on NOMAD3 (RTX 5060, v1.32.0-rc.3 + this patch hot-applied):
- After admin restart with passthrough healthy: log line
"[GpuPassthroughRemediationProvider] NVIDIA passthrough healthy — no action
needed." Provider exits cleanly without touching the container.
- The broken-state branch hits the existing forceReinstall path, which was
manually invoked earlier in the same session to fix this exact box and
recovered GPU access in ~45s with model volume intact. No new failure mode
is introduced — the auto-trigger removes the user click but the underlying
operation is the same one the banner Fix button already calls.
Closes#755.
Stacks on top of the multi-batch ZIM ingestion fix. After that fix,
multi-batch ZIM ingestion completes correctly — but on installs where
Ollama runs the embedding model on CPU (currently every AMD ROCm
install, since Ollama's ROCm build doesn't accelerate nomic-bert),
the now-correct sustained 100% CPU saturation across all cores can
starve other services hard enough to take the box down. Confirmed
on a Threadripper 3960X + RX 6800 NOMAD: a wikipedia-class ZIM
ingestion pegged 48 threads cleanly enough that sshd lost
banner-exchange responsiveness and the box ultimately required a
power-cycle.
NVIDIA installs aren't affected — nomic-embed-text:v1.5 runs at
100% GPU on RTX 5060 (verified via `ollama ps`).
Detect placement at runtime, pace only when needed:
1. OllamaService.isEmbeddingGpuAccelerated() — queries /api/ps and
returns true if any loaded embedding model reports size_vram > 0.
Fails closed (returns false) if /api/ps is unreachable or no embed
model is loaded yet — over-pacing is safer than crashing.
2. EmbedFileJob.handle() — between batches (hasMoreBatches: true
branch), check placement and `await setTimeout(CPU_BATCH_DELAY_MS)`
when CPU-only. CPU_BATCH_DELAY_MS = 1000 (1s) — enough to give the
OS scheduler a window for sshd/disk-collector/etc., small enough
that total ingestion time isn't meaningfully affected (each batch
is ~60-90s of work).
GPU-accelerated installs see zero behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every static call site instantiated a fresh QueueService (24 call sites
across 8 files). QueueService.getQueue() opens a BullMQ Queue per call
when not cached, and each Queue opens two ioredis connections (one for
commands, one blocking). Because every static call constructed a new
QueueService, its internal `queues` cache was never shared, every call
opened a fresh pair, and none were ever closed.
In normal operation this leaked a few connections per API hit. During
multi-batch ZIM ingestion after PR #872 (where EmbedFileJob.handle()
dispatches the next batch every 50 articles), every batch completion
opened two new connections. On NOMAD3 at ~one batch every 4s sustained,
that's ~1800 leaked connections/hour. Redis hit its 10,000-maxclient
ceiling in ~5 hours and the admin container fell into an EPIPE flood
that required a restart to recover.
Fix: collapse QueueService to a true process-wide singleton with a
private constructor and getInstance() accessor. The existing per-queue
Map is now shared across every dispatch / status / cleanup call, so each
queue's underlying connections are opened exactly once for the lifetime
of the process. close() now clears the map so the singleton can be torn
down cleanly if a graceful-shutdown hook is ever wired up.
Validated on NOMAD3 (RTX 5060, v1.32.0-rc.4 + this patch hot-applied):
under sustained multi-batch wikipedia_en_simple_all_nopic ingestion,
connected_clients held flat at 21-22 across a 5-minute window. Pre-fix
the same scenario climbed to 10,000+ over hours.