After an update, container recreate, or docker daemon restart, nomad_ollama's
HostConfig.DeviceRequests still lists the nvidia driver — but the NVIDIA
Container Toolkit binding inside the container is torn. `nvidia-smi` returns
"Failed to initialize NVML: Unknown Error" and Ollama silently falls back to
CPU inference. PR #208 detects this and shows a banner with a "Fix: Reinstall
AI Assistant" button. This change does that click automatically on admin boot.
New provider GpuPassthroughRemediationProvider runs once on web env boot:
1. Skip when KV `ai.autoFixGpuPassthrough = false` (default true).
2. Skip when Docker has no `nvidia` runtime registered (AMD-only and CPU-only
hosts unaffected).
3. Skip when nomad_ollama isn't running.
4. Exec `nvidia-smi --query-gpu=name --format=csv,noheader` inside the
container with an 8-second timeout. If the output matches
"Failed to initialize NVML", "Unknown Error", "TIMEOUT", or contains no
alphabetic characters, treat the passthrough as broken.
5. On broken: call DockerService.forceReinstall('nomad_ollama'). The existing
force-reinstall preserves the Ollama volume + installed models. Stamp
`gpu.autoRemediatedAt` on success.
6. On healthy: log and exit.
AMD passthrough_failed is intentionally not handled — its fix path is HSA
override handling (PR #804) rather than a simple service recreate, and false
positives during AMD startup log parsing would loop a recreate without fixing
anything. Left to a follow-up if it proves to be a recurring AMD issue.
Validated on NOMAD3 (RTX 5060, v1.32.0-rc.3 + this patch hot-applied):
- After admin restart with passthrough healthy: log line
"[GpuPassthroughRemediationProvider] NVIDIA passthrough healthy — no action
needed." Provider exits cleanly without touching the container.
- The broken-state branch hits the existing forceReinstall path, which was
manually invoked earlier in the same session to fix this exact box and
recovered GPU access in ~45s with model volume intact. No new failure mode
is introduced — the auto-trigger removes the user click but the underlying
operation is the same one the banner Fix button already calls.
Closes#755.
Stacks on top of the multi-batch ZIM ingestion fix. After that fix,
multi-batch ZIM ingestion completes correctly — but on installs where
Ollama runs the embedding model on CPU (currently every AMD ROCm
install, since Ollama's ROCm build doesn't accelerate nomic-bert),
the now-correct sustained 100% CPU saturation across all cores can
starve other services hard enough to take the box down. Confirmed
on a Threadripper 3960X + RX 6800 NOMAD: a wikipedia-class ZIM
ingestion pegged 48 threads cleanly enough that sshd lost
banner-exchange responsiveness and the box ultimately required a
power-cycle.
NVIDIA installs aren't affected — nomic-embed-text:v1.5 runs at
100% GPU on RTX 5060 (verified via `ollama ps`).
Detect placement at runtime, pace only when needed:
1. OllamaService.isEmbeddingGpuAccelerated() — queries /api/ps and
returns true if any loaded embedding model reports size_vram > 0.
Fails closed (returns false) if /api/ps is unreachable or no embed
model is loaded yet — over-pacing is safer than crashing.
2. EmbedFileJob.handle() — between batches (hasMoreBatches: true
branch), check placement and `await setTimeout(CPU_BATCH_DELAY_MS)`
when CPU-only. CPU_BATCH_DELAY_MS = 1000 (1s) — enough to give the
OS scheduler a window for sshd/disk-collector/etc., small enough
that total ingestion time isn't meaningfully affected (each batch
is ~60-90s of work).
GPU-accelerated installs see zero behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every static call site instantiated a fresh QueueService (24 call sites
across 8 files). QueueService.getQueue() opens a BullMQ Queue per call
when not cached, and each Queue opens two ioredis connections (one for
commands, one blocking). Because every static call constructed a new
QueueService, its internal `queues` cache was never shared, every call
opened a fresh pair, and none were ever closed.
In normal operation this leaked a few connections per API hit. During
multi-batch ZIM ingestion after PR #872 (where EmbedFileJob.handle()
dispatches the next batch every 50 articles), every batch completion
opened two new connections. On NOMAD3 at ~one batch every 4s sustained,
that's ~1800 leaked connections/hour. Redis hit its 10,000-maxclient
ceiling in ~5 hours and the admin container fell into an EPIPE flood
that required a restart to recover.
Fix: collapse QueueService to a true process-wide singleton with a
private constructor and getInstance() accessor. The existing per-queue
Map is now shared across every dispatch / status / cleanup call, so each
queue's underlying connections are opened exactly once for the lifetime
of the process. close() now clears the map so the singleton can be torn
down cleanly if a graceful-shutdown hook is ever wired up.
Validated on NOMAD3 (RTX 5060, v1.32.0-rc.4 + this patch hot-applied):
under sustained multi-batch wikipedia_en_simple_all_nopic ingestion,
connected_clients held flat at 21-22 across a 5-minute window. Pre-fix
the same scenario climbed to 10,000+ over hours.
EmbedFileJob.dispatch() uses a deterministic per-file jobId
(sha256(filePath).slice(0,16)) for every batch. The parent batch's
handle() calls EmbedFileJob.dispatch({ batchOffset }) before returning,
so the parent is still in `active` state and locked when the
continuation tries to enqueue. BullMQ silently returns the locked
parent instead of creating a new job — and in newer BullMQ versions
it does so without throwing, so the existing
`catch (error.message.includes('job already exists'))` branch never
fires. After the parent completes, its entry stays in the `completed`
ZSET (held by `removeOnComplete: { count: 50 }`), continuing to trip
jobId dedupe for any subsequent re-dispatch attempts.
Result: every NOMAD install since 2026-02-08 (feat: zim content
embedding) with a multi-batch ZIM (wikipedia, cooking SE, ifixit,
lrnselfreliance, etc.) has only the first 50 articles indexed in
qdrant. The RAG feature has been silently degraded for ~3 months —
the user sees the file appear in their KB, qdrant accumulates ~50
articles' worth of vectors, and pagination quietly halts. No error
surfaces anywhere.
Fix: dispatch() skips the deterministic jobId for continuation batches
(batchOffset > 0), letting BullMQ auto-generate a unique one so each
batch stacks as an independent queue entry. Initial dispatches keep
the deterministic jobId so re-triggering an install (UI re-click,
sync rescan) remains idempotent. The existing 'job already exists'
branch is now gated on !isContinuation, since by construction
continuation batches will never hit dedupe.
Validated on NOMAD8 (RX 6800 / Threadripper 3960X, rc.3 + this patch):
devdocs_en_python (~1,500 chunks across multiple batches) correctly
paginates end-to-end. admin.log shows the expected sequence of
"Dispatched embedding job for file: X (continuation @ offset N)"
followed by "Starting embedding process for: X (batch offset: N)"
for each batch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The short-conversation skip in `rewriteQueryWithContext` used `userMessages.length <= 2`,
which short-circuits both the very first turn AND the first follow-up. The follow-up is
the moment the rewriter matters most — it's where pronouns and shorthand ("the bars",
"how long does it last?") need to be resolved against earlier turns before the embedding
search runs. With the rewriter skipped, RAG queries against the raw last message, scores
nothing above the 0.3 threshold, and no context gets injected for that turn.
The visible symptom is the assistant treating the first follow-up in any chat as a
brand-new question — e.g. "great - they threw up 2 of the bars it looks like" answered
as if it were a recipe-bars question, with no carry-forward of the prior chocolate-
poisoning context.
Threshold lowered to `< 2`: skip only when there's exactly one user message (nothing to
rewrite from). From the first follow-up onward the rewriter runs, as originally intended
before commit 96e5027.
Validated against `mistral-nemo:12b` on NOMAD3 by hot-patching the compiled controller
and replaying the dog-chocolate scenario. Post-patch response correctly threads "3
Hershey's bars" from turn 1 into turn 2's answer; pre-patch (per reporter's screenshot)
pivoted to peanut butter bar recipes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
download.kiwix.org (and some of its mirrors) don't always set a
Content-Type header on .zim responses. The MIME validator was reading
`headers['content-type'] || ''`, then running each allowlist entry
through `''.includes(...)` which is always false, so every download
from those hosts was rejected with `MIME type is not allowed`.
RFC 7231 §3.1.1.5 says missing Content-Type may be treated as
application/octet-stream by the recipient, and that's already in every
binary-content allowlist we use (ZIM, PMTILES, base assets). Default
the missing case to that and the validator does the right thing.
Strict callers that don't list octet-stream still reject as before.
Closes#855.
PR #804's AMD branch in `updateContainer()` overrode `newImage` to
`ollama/ollama:rocm` and then persisted that literal string to
`service.container_image` (line 1273). Two downstream consequences for
every AMD user who clicked Update on AI Assistant:
1. Apps page (`apps.tsx`) extracts the displayed version from
`container_image` and rendered the literal string "rocm".
2. `ContainerRegistryService.getAvailableUpdates()` parsed `currentTag =
"rocm"`, which isn't semver, so `parseMajorVersion` returned NaN, the
filter didn't reject newer tags by major-version, and `isNewerVersion`
treated any future tag as newer. Result: the same update reappeared
on every check, forever.
Fix: separate "what we run" from "what we persist". `runtimeImage`
holds the tag passed to `docker.pull()` and `createContainer()` (still
`:rocm` for AMD), while `newImage` keeps the semver tag and is the
value written to the DB. Surgical: 3 references renamed plus 1
declaration added.
The install path (`_createContainer`) already had the right shape
(runtime-only override, no DB write of the override), so this PR only
touches `updateContainer`.
Test plan:
- `npm run typecheck` passes locally.
- Manual repro on NOMAD2 (AMD HX 370 / 890M, rc.2): before fix, DB
shows `container_image = ollama/ollama:rocm` after triggering an
Ollama update via Settings > Apps; Apps page shows version "rocm";
`/api/system/services/check-updates` immediately re-reports the same
update available. After fix, DB shows `container_image =
ollama/ollama:<targetVersion>`; Apps page shows the semver; check-
updates does not re-report the same update.
- nomad_ollama container itself still runs the `:rocm` image
(verified via `docker inspect`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #804 added pciutils to the admin image so AMD detection could fall
back to lspci. Side effect for NVIDIA: si.graphics() now finds the card
via lspci but reads the BAR0 region size (16-32 MiB on most NVIDIA
cards) as VRAM, since nvidia-smi isn't installed in the admin image to
enrich the result. A GTX 1050 Ti showed 32 MB instead of 4096.
The nvidia-smi-via-Ollama and Ollama log probes already give the right
number, but they only ran when graphics.controllers came back empty.
Extend the trigger so they also run when the only entries are NVIDIA
controllers reporting under 256 MiB (no real dGPU has that little). If
the probes can't reach a value either (Ollama not installed, passthrough
broken), VRAM now falls back to N/A instead of the bogus 32.
Verified locally on an RTX 4070 Ti by simulating the container condition
(lspci available, nvidia-smi unreachable). Before the fix: vram 32, model
"AD104 [GeForce RTX 4070 Ti]". After: vram 12288, model "NVIDIA GeForce
RTX 4070 Ti", from the Ollama inference-compute log line. Also confirmed
the same result inside the actual admin Docker image.
Closes#810.
## Bug A: HSA_OVERRIDE_GFX_VERSION=11.0.0 was unconditional
PR #804 set HSA_OVERRIDE_GFX_VERSION=11.0.0 for any AMD GPU. The inline comment
claimed this was harmless on supported discrete cards (gfx1030 RX 6800, etc.) — empirically false. With the override, Ollama crashes during GPU discovery on
gfx1030 and falls back to CPU silently. Affects every NOMAD user with an
RX 6800 or other RDNA 2 discrete card.
The correct value depends on the gfx version:
- gfx1030, gfx1100, gfx1101, gfx1102: officially supported by ROCm — no override
- gfx1031..gfx1036 (RDNA 2 variants + iGPUs like Rembrandt 680M): 10.3.0
- gfx1103, gfx1150, gfx1151 (Phoenix 780M, Strix 890M, Strix Halo): 11.0.0
### Resolution chain in `_resolveAmdHsaOverride()`
1. KV `ai.amdHsaOverride` — manual override; accepts 'none' to disable, or a
semver-style value to force.
2. Marker file `/app/storage/.nomad-amd-gfx` — written by install_nomad.sh
based on lspci codename. Mapped to override via `_mapGfxToHsaOverride()`.
3. Default: `11.0.0` — preserves prior behavior so existing iGPU users
(780M / 890M, the dominant AMD population today) don't regress on upgrade.
Discrete RDNA 2 users on existing installs can opt out via
`ai.amdHsaOverride='none'` and force-reinstall AI Assistant, OR re-run
install_nomad.sh to refresh the marker file.
The helper is used in both `createContainer` (initial install) and
`updateContainer` (image update) paths, replacing the unconditional push.
## Bug B: BenchmarkService had no AMD discrete detection path
`BenchmarkService.getHardwareInfo()` had three GPU detection fallbacks:
1. `si.graphics()` — empty inside Docker for AMD
2. nvidia-smi — NVIDIA only
3. AMD APU regex from CPU model — integrated only
Result: AMD discrete cards (RX 6800, RX 7900 XTX, etc.) showed up as
"GPU: Not detected" on the leaderboard despite ROCm working. Corrupts
leaderboard data quality for that population.
Fix: after the existing fallbacks, call `SystemService.getSystemInfo()` and
read `graphics.controllers[0].model`. That path already handles AMD via the
marker file + Ollama log probe added in PR #804, so we're reusing existing
plumbing rather than duplicating detection logic.
## install_nomad.sh changes
The existing AMD detection block already runs lspci. Added a codename parse
step that maps Navi 21/22/23/24, Rembrandt, Phoenix1/Phoenix2, Strix/Strix
Point/Strix Halo, and Navi 31/32/33 to gfx versions, then writes
`/opt/project-nomad/storage/.nomad-amd-gfx`. Unknown codenames write nothing
(admin handles missing-marker case via the backward-compat default).
## Validation
Both bugs were originally surfaced and validated empirically on RX 6800 /
gfx1030 / Ubuntu 24.04 + kernel 6.17 + ollama/ollama:rocm during the #810
filing. Validation grid from that report:
| Run | NOMAD Score | tok/s | GPU detected |
|-----------------------------------------------|-------------|-------|-------------------------|
| Pre-fix (Bug A active) | n/a | 0 | yes, but library=cpu |
| HSA_OVERRIDE removed, Bug B unfixed | 73.8 | 221.6 | "Not detected" |
| Both fixes hot-patched (this PR's behavior) | 73.7 | 216.0 | AMD Radeon RX 6800 |
Local checks: `npm run typecheck` clean, `npm run build` clean.
Closes#796.
The maps API has accepted and persisted `notes` on map markers since
PR #770, but the marker popup component still rendered name only and
ignored the field. Now the popup shows a notes block beneath the name
when it's populated, with whitespace preserved and long text wrapped.
Threaded `notes` through the read path:
- `api.listMapMarkers` / `api.createMapMarker` response types
- `MapMarker` interface in `useMapMarkers` and the data.map projection
- `MapComponent`'s selectedMarker popup
The create/update UI is unchanged — users still set notes via the API
or DB directly, matching the issue's stated scope. A marker entry with
empty/whitespace-only notes renders the same as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#826.
1. Heading and subtext now read from `versionInfo` state (which the
Check Again mutation already populates) instead of the server-rendered
`props.system`. Previously the card kept showing "System Up to Date /
Your system is running the latest version!" alongside the new
`Latest Version` row + Start Update button after a successful recheck.
Status icon also switched to `versionInfo` for consistency.
2. The pulling-state heading rendered the lowercase status enum
(`pulling`, `pulled`, ...) and relied on a Tailwind `capitalize` class
for the visible glyph. Screen readers and other accessible-name
consumers got the lowercase value with no transform applied. Replaced
with a `STAGE_LABELS` map so visual + accessible names match.
3. The sidecar (install/sidecar-updater/update-watcher.sh) writes
`complete` for ~5s, then resets the status file to `idle`. The SPA
could miss that window across the admin container restart, leaving
the page parked on its last observed progress percentage indefinitely
while the upgrade was actually finished on disk. A `seenAdvancedStageRef`
now records whether the session ever observed an advanced stage; a
later poll seeing `idle` is treated as the missed completion, and the
page reloads as advertised in step 3 of the on-screen process. Reset
on each Start Update.
4. Toggling Enable Early Access now triggers a recheck on success, so
the eligible-version list updates immediately instead of requiring a
manual Check Again click.
Single file touched: admin/inertia/pages/settings/update.tsx.
Typecheck (tsc --noEmit) passes; static UI changes verified in source.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(content): add custom ZIM library sources with pre-seeded mirrors
Users reported slow download speeds from the default Kiwix CDN. This adds
the ability to browse and download ZIM files from alternative Kiwix mirrors
or self-hosted repositories, all through the GUI.
- Add "Custom Libraries" button next to "Browse the Kiwix Library"
- Source dropdown to switch between Default (Kiwix) and custom libraries
- Browsable directory structure with breadcrumb navigation
- 5 pre-seeded official Kiwix mirrors (US, DE, DK, UK, Global CDN)
- Built-in mirrors protected from deletion
- Downloads use existing pipeline (progress, cancel, Kiwix restart)
- Source selection persists across page loads via localStorage
- Scrollable directory browser (600px max) with sticky header
- SSRF protection on all custom library URLs
Closes#576
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(content): recognize Wikipedia downloads from mirror sources
When Wikipedia is downloaded via a custom mirror instead of the default
Kiwix server, the completion callback now matches by filename instead
of exact URL. This ensures the Wikipedia selector correctly shows
"Installed" status and triggers old-version cleanup regardless of
which mirror was used.
Also handles the case where no Wikipedia selection exists yet (file
downloaded before visiting the selector), creating the record
automatically.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ZIM): use cheerio for custom mirror directory parsing
* fix(ZIM): use URL constructor for more robust joining
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jake Turner <jturner@cosmistack.com>
* feat: Updated the map to show the coordinates as the user moves the cursor over the map. Changed the cursor to a crosshairs to make it easier to place map markers.
* Moved the scale unit control to its own component file for easier maintenance. Enhanced the behavior of the coordinate display on the map to not display when over the on screen controls, and the navigation bar. Added a toggle to turn off the coordinate display if the user doesn't wish to see it. Intentionally left the coordinate display when over a map marker so that the coordinates of the map marker can be estimated. In the future I intend to add the coordinates of a map marker when the map marker is clicked so that behavior may change in the future.
---------
Co-authored-by: Kenneth Brewer <kennethbrewer3@protonmail.com>
Three UX issues from manual testing of #780 on NOMAD3.
1. Slider was unusable for multi-step zoom changes
`setLoading(true)` fired immediately on every selection or maxzoom change,
which disabled the slider until the request returned. Even with the 400ms
debounce delaying the network call, the UI was locked the whole time.
User couldn't drag through zoom levels to find the right one.
Fix: bump debounce to 1500ms, move `setLoading(true)` inside the setTimeout
so it only flips after the debounce expires. Slider stays interactive
throughout the wait. Slider `disabled` now only ties to `downloading`
(active extract dispatch), not `loading` (preflight in flight). The
existing requestId stale-safe pattern handles concurrent changes.
2. Newly-downloaded maps didn't show in Stored Map Files until manual refresh
`props.maps.regionFiles` is rendered server-side and passed through Inertia
props; without a partial reload it stayed stale until the user navigated
away and back.
Fix: watch `useDownloads({ filetype: 'map' })` count via a ref. When the
count drops (a download finished), trigger `router.reload({ only: ['maps'] })`
to refresh just the maps prop. Existing pattern from elsewhere in the
codebase.
3. Country picker didn't surface already-downloaded countries
When a user re-opened "Choose Countries" after downloading UK, UK appeared
unchecked with no indication it was already on disk.
Fix: pass installed pmtiles filenames into the modal as a prop; parse with
regex `^([a-z]{2})_[\w-]+_z\d+\.pmtiles$` to extract country codes from
single-country extracts (matching MapService.buildRegionSlug's iso2 lowercase
slug pattern). Render an "Installed" badge on those countries with a tooltip
explaining they're re-selectable for redownload at a different zoom.
Group / custom multi-country extracts don't reverse-map cleanly from
filename and are skipped here. Could be a follow-up if useful.
Files:
admin/inertia/components/CountryPickerModal.tsx
- SINGLE_COUNTRY_FILENAME_RE: iso2 + flexible date + zoom
- installedFilenames prop with default []
- installedCountrySet derivation via useMemo
- "Installed" badge rendering on country list rows
- Debounce: 400ms -> 1500ms; setLoading inside setTimeout
- Slider disabled: only on `downloading`
admin/inertia/pages/settings/maps.tsx
- import useEffect/useRef
- destructure activeMapDownloads from useDownloads
- useEffect on download count drop -> router.reload({ only: ['maps'] })
- pass installedFilenames to CountryPickerModal
All three fixes tested end-to-end on NOMAD3.
* feat(maps): add regional map downloads via go-pmtiles extract
* address Copilot review feedback on PR #780
- auto-refresh preflight on selection/maxzoom change with 400ms debounce and
requestId stale-safety so the confirm button no longer requires a two-step
"Estimate Size" -> "Start Download" dance
- safeUpdateProgress helper replaces fire-and-forget updateProgress().catch()
pattern so cancelled-job errors (code -1) can't surface as unhandled rejections
- gate world basemap source on worldBasemapReady - when ensureWorldBasemap()
fails we already delete world.pmtiles, so emitting the source was producing
404s on every tile request
- verify go-pmtiles binary SHA256 at image build time; upstream doesn't ship a
checksums file so per-arch hashes are pinned as build args with a regenerate
note when bumping PMTILES_VERSION
Content Updates had three UX problems that compounded:
1. No size column, so users had to guess how big an update would be before
clicking Update All. Upstream /api/v1/resources/check-updates doesn't
return size, so CollectionUpdateService now enriches each update with
a Content-Length HEAD request in parallel (5s timeout, non-fatal on
failure — the row just renders an em-dash).
2. Small ZIM updates (1-8 MB) never appeared in Active Downloads. Two
causes, both fixed: handleApply / handleApplyAll didn't invalidate the
download-jobs query after dispatching, and useDownloads idled at 30s
between polls — enough for a fast job to dispatch, download, and get
cleaned up by removeOnComplete before the next refetch.
3. applyUpdate didn't forward title / totalBytes to RunDownloadJob, so
any update that did briefly surface in Active Downloads had no label
and no byte-count progress, just a filename and a percentage. It now
passes both (matching zim_service's dispatch pattern).
Also parallelized applyAllUpdates so dispatching five updates doesn't
serialize five sequential BullMQ round-trips.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(AI): re-enable AMD GPU acceleration for Ollama via ROCm + HSA override
Re-enables AMD GPU support that was disabled in 77f1868 pending validation
of the ROCm image and device discovery. Validation done 2026-04-28 on a
Minisforum UM890 Pro (Ryzen 9 PRO 8945HS + Radeon 780M iGPU) — Ollama
correctly offloaded all model layers to the iGPU when the container was
started with /dev/kfd + /dev/dri passthrough and HSA_OVERRIDE_GFX_VERSION=11.0.0.
On llama3.2:1b, GPU inference ran at 51.83 tok/s vs 33.16 tok/s on CPU
(same hardware, same prompt) — a 1.56x speedup confirmed by Ollama logs
showing "load_tensors: offloaded 17/17 layers to GPU".
Changes
-------
docker_service.ts
- Restore _discoverAMDDevices() (simplified — pass /dev/dri as a directory
entry, mirroring `docker run --device /dev/dri` behavior, instead of the
prior brittle hardcoded card0/renderD128 fallback that broke on systems
where the AMD GPU enumerates as card1+).
- Restore the AMD branch in _createContainer():
- Switches Ollama image to ollama/ollama:rocm
- Mounts /dev/kfd + /dev/dri via Devices
- Sets HSA_OVERRIDE_GFX_VERSION=11.0.0 (required for unsupported-but-RDNA3
iGPUs like gfx1103; harmless on supported discrete cards)
- KV opt-out via ai.amdGpuAcceleration (default on)
- Mirror the AMD branch in updateContainer():
- Lifted GPU detection above docker.pull() so AMD updates pull :rocm
rather than the standard :targetVersion tag (per-version ROCm tags
aren't always published)
- Replaces stale HSA_OVERRIDE in the inspect-captured env on update,
so containers built before this PR pick up the current value
system_service.ts
- New getOllamaInferenceComputeFromLogs() — parses Ollama startup log line
"msg=\"inference compute\" ... library=CUDA|ROCm ..." which Ollama emits
for both NVIDIA and AMD. Catches silent CPU fallback (e.g. NVML death
after update, or HSA_OVERRIDE failure) that the prior nvidia-smi exec
probe couldn't detect.
- gpuHealth refactored to use log parsing as the primary probe for both
vendors, with nvidia-smi exec retained as the NVIDIA-only secondary
path for hardware enrichment when log parsing has no startup line yet.
- AMD path uses gpu.type KV value (persisted by DockerService._detectGPUType)
+ ai.amdGpuAcceleration opt-out to determine hasRocmRuntime.
types/system.ts
- GpuHealthStatus extended additively: hasRocmRuntime + optional gpuVendor.
types/kv_store.ts
- New ai.amdGpuAcceleration boolean (default-on).
settings/models.tsx, settings/system.tsx
- passthrough_failed banner copy now reads vendor from gpuHealth.gpuVendor
("an AMD GPU" vs "an NVIDIA GPU"). Same Fix button hits the same
force-reinstall endpoint, which now configures AMD correctly.
install_nomad.sh
- AMD detection in verify_gpu_setup() upgraded from a strict-positive
"ROCm not currently available" message to "ROCm acceleration will be
configured automatically." Also tightens the lspci match to display
controller classes (avoids false positives from AMD CPU host bridges,
matching the same fix already in DockerService._detectGPUType).
Auto-remediation
----------------
Issue #755 proposes auto-remediation when gpuHealth.status flips to
passthrough_failed (today the user has to click "Fix: Reinstall AI
Assistant"). When that PR lands, AMD coverage falls out for free since
this PR uses the same passthrough_failed status code via the shared
gpuHealth machinery — #755's guard will need to flip from
hasNvidiaRuntime === true to (hasNvidiaRuntime || hasRocmRuntime).
Closes#124 (AMD GPU support).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(AI): detect AMD GPU presence inside admin container via marker file
The admin container doesn't have lspci installed, and AMD GPUs don't register
a Docker runtime the way NVIDIA does — so DockerService._detectGPUType() and
SystemService.gpuHealth had no way to know an AMD GPU was present.
The previous implementation fell through to lspci, which silently failed inside
the admin container, leaving gpu.type unset and gpuHealth stuck at 'no_gpu'
even on systems with an AMD GPU. (NVIDIA worked because Docker registers the
nvidia runtime, which is reachable via dockerInfo.Runtimes from any container.)
Discovered while testing the AMD acceleration patch on a Minisforum UM890 Pro:
the AMD branch in _createContainer() never fired because _detectGPUType()
returned 'none' even on a host with a working /dev/kfd.
Fix
---
install_nomad.sh writes the host-detected GPU type ('nvidia' | 'amd') to a
marker file in the storage volume the admin container already bind-mounts:
/opt/project-nomad/storage/.nomad-gpu-type → /app/storage/.nomad-gpu-type
DockerService._detectGPUType() reads the marker as a secondary probe (after
the Docker runtime check) — covers AMD detection from inside the container
without requiring lspci or a /dev bind mount.
SystemService falls back to the marker file when KV gpu.type is empty so the
System page reflects AMD presence even before the user installs AI Assistant
for the first time. (Without this, the page would say 'no_gpu' until Ollama
was installed, even on hosts with an AMD GPU detected at install time.)
Verified on NOMAD6 (UM890 Pro, Ubuntu 24.04, 780M iGPU): with the marker file
in place and admin restarted, the patch's AMD branch fires correctly on Force
Reinstall AI Assistant. Resulting nomad_ollama runs ollama/ollama:rocm with
/dev/kfd + /dev/dri passthrough and HSA_OVERRIDE_GFX_VERSION=11.0.0; Ollama
logs show 'library=ROCm compute=gfx1100 ... type=iGPU'. NOMAD's in-product
benchmark on the same hardware climbed from 33.8 tok/s (CPU) to 57.3 tok/s
(GPU) — a 1.69x speedup, with TTFT dropping from 148ms to 66ms.
Migration for existing AMD installs
-----------------------------------
Users on an existing NOMAD install with an AMD GPU have no marker file (the
install script wrote it on a fresh install). Two paths get them on the GPU:
1. Re-run install_nomad.sh — writes the marker, no other side effects
2. Manually: echo amd | sudo tee /opt/project-nomad/storage/.nomad-gpu-type
Either then triggers AMD detection on the next AI Assistant install/reinstall.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(AI): pull ollama/ollama:rocm separately when AMD branch overrides image
The pull-if-missing logic in _createContainer ran against service.container_image
(the DB-pinned tag, e.g. ollama/ollama:0.18.2). The AMD branch then overrode
finalImage to ollama/ollama:rocm — but if that image wasn't already local, the
container creation step failed with "no such image: ollama/ollama:rocm".
Caught while validating on NOMAD2 (Ryzen AI 9 HX 370 + Radeon 890M / RDNA 3.5):
the prior end-to-end test on NOMAD6 had silently passed because the rocm image
was already pulled there from an earlier sidecar test, masking the bug.
Fix: inside the AMD branch, after setting finalImage to ollama/ollama:rocm,
run a parallel _checkImageExists + docker.pull dance for the new tag.
Also confirmed via this validation: the same HSA_OVERRIDE_GFX_VERSION=11.0.0
override works on the 890M (gfx1150 / RDNA 3.5) — Ollama logs report
'library=ROCm compute=gfx1100 description="AMD Radeon 890M Graphics"' and
inference runs at 51.68 tok/s (matching the existing X1 Pro published tile
of 51.7 tok/s on the same hardware class). RDNA 3 (780M, gfx1103) and RDNA
3.5 (890M, gfx1150) both use the same override successfully.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build(Dockerfile): include pciutils for lspci gpu detection fallback
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Jake Turner <jturner@cosmistack.com>
Adds a check to RAG health to make sure nomad_qdrant is online, if not
then the user will be blocked from clicking any buttons in the KB modal
until they click the start qdrant button and let the container start
There is a new file qdrant_restart_policy_provider.ts, which tries to
ensure that the restart policy always exists for the nomad_qdrant
container even though the policy should have been there when the
container is created.
The VineJS validators in createMarker and updateMarker silently
dropped fields not in their schema. The MapMarker model and DB
include notes and marker_type, and GET responses return them, but
POST and PATCH would not persist them.
updateMarker additionally did not accept latitude/longitude, so
markers could not be repositioned via the API after creation.
- Add notes and marker_type to both validators and model assignments.
- Add latitude/longitude to the update validator.
- Add coordinate range validation on both endpoints.
Closes#768
Some Ollama installs ship nomic-embed-text:v1.5 with the embedding
model's default num_ctx=2048, which the RAG chunker (sized for ~1500
tokens of estimated content with ratio=2 chars/token) can exceed on
dense PDFs. The result is `400 the input length exceeds the context
length` from /api/embed, which then hits the OpenAI-compatible
fallback (which also errors), and surfaces as a BadRequestError.
Pass options.num_ctx=8192 (nomic-embed-text v1.5's RoPE-extrapolated
max) and truncate=true (silent truncation safety net) on every
embed call so we don't depend on the local modelfile defaults.
Reported on #756 by @NC4WD; same root cause as #369 and #670 which
were closed without an actual fix.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(stream): skip compression for Server-Sent Events
The global compression middleware (added in v1.31.0-rc.2) buffers
response writes to determine encoding, which collapses per-token
streaming into a single block delivered after generation completes.
This broke the AI chat streaming UX from v1.31.0-rc.2 onward — text
no longer appears progressively as the model generates it, only at
the end.
Adds a filter to compression() that returns false when the response
Content-Type is text/event-stream. Other responses still go through
the default compression filter (compressible types are still
compressed; e.g. text/html via Brotli).
Reproduced on NOMAD3 v1.31.1: before fix, all SSE chunks for a 1B
model arrive within 10ms of each other after the model finishes.
After fix, tokens arrive at ~150ms intervals as they're generated
on a 12B model, with no Content-Encoding header on the SSE response.
Verified on the same host that /home still returns
Content-Encoding: br for HTML responses.
Closes#781. Reported and bisected by @toasterking
(works in v1.31.0-rc.1, broken from v1.31.0-rc.2 onward).
* fix(stream): use any for filter params to match existing as-any pattern
The compression library types its filter as (req: Request, res: Response)
expecting Express types, but AdonisJS passes raw IncomingMessage/ServerResponse
which is why the surrounding middleware uses `as any` casts at the call site.
The IncomingMessage/ServerResponse types I added are runtime-correct but
fail tsc against the library's declared types.
Drop the typed import in favor of `any` parameters, which matches how the
existing `compress(request.request as any, response.response as any, ...)`
call resolves the same mismatch.
Closes#685
Content Manager now surfaces the on-disk size of each ZIM file alongside
title/summary, and lets users sort the list by Size or Title. Defaults to
Size descending so the largest files are visible first.
- ZimService.list() now stats each file and returns size_bytes
- Content Manager table adds a formatted Size column (via formatBytes)
- Sortable headers for Title and Size with asc/desc toggle
Three bugs in the RAG embedding pipeline, diagnosed and patched by @sbruschke
against v1.31.0 with working before/after chunk counts. All three are
root-cause contributors to #388.
1. scanAndSyncStorage queued every file under /storage/zim/ for embedding,
including Kiwix's generated kiwix-library.xml. EmbedFileJob rejected it
with "Unsupported file type" and the default 30-attempt retry policy
kept it looping on every sync, flooding nomad_admin logs. Now gated on
determineFileType(filePath) !== 'unknown'.
2. hasMoreBatches compared zimChunks.length (section-level chunk count
under the 'structured' strategy) against ZIM_BATCH_SIZE (an article
limit). Because articles emit multiple sections, the two are never
equal for real archives and processing silently stopped after the
first 50 articles. Now gated on articlesInBatch >= ZIM_BATCH_SIZE.
3. extractStructuredContent walked only direct children of <body>, so any
ZIM that wraps content in a container div (Devdocs, Wikipedia,
FreeCodeCamp, React docs, etc.) produced zero sections and silently
embedded zero chunks while reporting success. Now walks the full DOM
via $('body').find('h2, h3, h4, p, ul, ol, dl, table'), with a
whole-body text fallback when the selector walk yields nothing.
Before/after chunk counts confirmed by @sbruschke on v1.31.0:
devdocs_en_git 0 -> 916
devdocs_en_react 0 -> 481
devdocs_en_node 0 -> 423
libretexts_en_eng 1 -> 35 (climbing)
Wikipedia resumed progressing normally through its 6M articles.
Closes#718Closes#719Closes#720Closes#388
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When many ZIMs are already installed locally, a single Kiwix catalog page
(12 items) could return 12 already-installed items, which zim_service
would fully filter out client-side. The endpoint returned items: [] with
has_more: true, and the frontend's infinite-scroll guard
(flatData.length > 0) blocked fetchNextPage — leaving the user with
"No records found" despite plenty of uninstalled ZIMs available.
Backend now accumulates across up to 5 Kiwix fetches (60 items each)
until it has enough post-filter results to return, dedupes by entry id,
advances currentStart by actual entries returned (not requested), and
returns a next_start cursor. The frontend consumes that cursor instead
of computing Kiwix offsets locally, and the flatData.length > 0 guard is
removed so the existing on-mount effect drives bounded auto-fetch when
a short page lands.
The pre-existing has_more off-by-one (compared totalResults against the
input start rather than the post-fetch position) is fixed implicitly.
Diagnosis credit: @johno10661.
Closes#731
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When users set a remote Ollama URL via AI Settings, the local nomad_ollama
container continued running and competed with the remote host for port 11434
and GPU access. Now configureRemote stops the local container on set and
restores it on clear (if still present). Container and its models volume are
preserved so the local install can be re-enabled later.
Closes#662
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qdrant's upstream default enables anonymous telemetry to telemetry.qdrant.io,
which doesn't match NOMAD's offline-first "zero telemetry" posture. Adding
QDRANT__TELEMETRY_DISABLED=true to the container environment turns it off for
fresh installs and reinstalls.
Existing installs keep their current telemetry-enabled container until the
Qdrant service is force-reinstalled via the Knowledge Base panel or the next
container recreation — Docker bakes Env into containers at create time, so
env changes require a new container.
Closes#742
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces a dedicated page listing third-party ZIM content packs built
by the community. Launches with the two current add-ons (jrsphoto field
manuals, kennethbrewer W3Schools) and explains how to install a ZIM pack
and where to submit a new one for inclusion.
- New doc at admin/docs/community-add-ons.md
- Wired into DocsService DOC_ORDER (slot 4) and TITLE_OVERRIDES so the
hyphen in "Add-Ons" is preserved in the sidebar
- README gets a link under Community & Resources
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a cancel button to in-progress Ollama model downloads and unifies
the Active Model Downloads card layout with the Active Downloads card
used for ZIMs, maps, and pmtiles (byte counts, progress bar, live speed,
status indicator).
Closes#676.
Downloads are now written to `filepath + '.tmp'` and atomically renamed
to the final path only on successful completion. Kiwix globs for `*.zim`
and ZimService filters `.endsWith('.zim')`, so `.tmp` files are invisible
to both during download. The same staging applies to `.pmtiles` map files.
Ref #372
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>