project-nomad/admin
Chris Sherwood d2f2172b3c fix(System): correct AMD VRAM in Graphics card + harden log probe
Two related fixes to make the System Information page reliably show real
GPU info instead of misleading lspci BAR0 readings or N/A.

1. Generalize bogus-VRAM detection to AMD.

Same root cause as #835 (NVIDIA showing 32 MB), this time for AMD: lspci
parses the first PCI memory Region (BAR0, typically 1-16 MiB on Navi
cards) as `vram`. On NOMAD8 (Threadripper 3960X + Radeon RX 6800), the
System Information page showed "1 MB" instead of "16 GB". PR #850 fixed
this for NVIDIA by clearing the bogus value and re-running the Ollama
log probe; the check was vendor-gated to NVIDIA only.

`isBogusNvidiaVram` becomes `isBogusDgpuVram` with a `isDiscreteGpuVendor`
helper matching /nvidia|advanced micro devices|amd|ati/i. Same 256-MiB
threshold — no real discrete GPU has less than that, while Intel iGPUs
(which legitimately report small shared-memory VRAM via lspci) are left
untouched. The probe gate condition is similarly renamed.

2. Read Ollama logs from the startup window, not tail:N.

`getOllamaInferenceComputeFromLogs()` was reading the last 500 log lines
and grepping for the "inference compute" line. That line is written once
during Ollama's GPU discovery phase within seconds of startup. Under
active embedding workloads we measured >1000 log lines/min, which pushes
the line past any reasonable tail within minutes — at which point the
probe returns null and the UI flips to "GPU Not Accessible" even though
Ollama is happily using the GPU (size_vram > 0 in /api/ps).

Switch from `tail: 500` to `since: containerStartedAt, until:
containerStartedAt + 300s`. The 5-minute window is bounded regardless of
container uptime and always captures Ollama's GPU discovery output. The
inference-compute line is emitted in the first few seconds of startup, so
5 min is generous headroom.

Validated on NOMAD8 (RX 6800, container uptime ~10 min with sustained
ingestion that generated 6,345 log lines):

Before:
  controllers[0]: { model: "Navi 21 ...", vram: 1 }

After (bogus AMD VRAM cleared, log probe stale due to tail:500 churn):
  controllers[0]: { model: "Navi 21 ...", vram: null }
  gpuHealth: { status: "passthrough_failed" }
  -> UI shows "N/A" and the banner from PR #208

After (bogus cleared + log probe reads startup window):
  controllers[0]: { model: "AMD Radeon RX 6800", vram: 16384 }
  gpuHealth: { status: "ok", hasRocmRuntime: true, ollamaGpuAccessible: true }
  -> UI shows "16 GB", no banner

Both branches of the fix exercise correctly: NVIDIA path unchanged
(same code, just renamed identifiers), AMD path now triggers the probe
and the probe reliably finds the GPU info regardless of container age.
2026-05-20 10:16:00 -07:00
..
app fix(System): correct AMD VRAM in Graphics card + harden log probe 2026-05-20 10:16:00 -07:00
bin feat: curated content system overhaul 2026-02-11 15:44:46 -08:00
commands feat(Maps): regional map downloads via go-pmtiles extract (#780) 2026-05-20 10:16:00 -07:00
config fix: cache docker list requests, aiAssistantName fetching, and ensure inertia used properly 2026-04-03 14:26:50 -07:00
constants feat(Maps): regional map downloads via go-pmtiles extract (#780) 2026-05-20 10:16:00 -07:00
database feat(Content): custom ZIM library sources with pre-seeded mirrors (#593) 2026-05-20 10:16:00 -07:00
docs docs: update release notes 2026-05-20 10:16:00 -07:00
inertia fix(Maps): render notes in marker popup when populated 2026-05-20 10:16:00 -07:00
providers feat(GPU): auto-remediate nomad_ollama passthrough loss on admin boot (#755) 2026-05-20 10:16:00 -07:00
public feat: switch all PNG images to WEBP (#575) 2026-04-03 14:26:50 -07:00
resources feat(Maps): regional map downloads via go-pmtiles extract (#780) 2026-05-20 10:16:00 -07:00
start feat(Content): custom ZIM library sources with pre-seeded mirrors (#593) 2026-05-20 10:16:00 -07:00
tests fix(UI): improve global map banner display logic (#702) 2026-05-20 10:16:00 -07:00
types feat(GPU): auto-remediate nomad_ollama passthrough loss on admin boot (#755) 2026-05-20 10:16:00 -07:00
util feat: display model download progress 2026-02-06 16:22:23 -08:00
views feat: initial commit 2025-06-29 15:51:08 -07:00
.editorconfig feat: initial commit 2025-06-29 15:51:08 -07:00
.env.example feat: Add Windows Docker Desktop support for local development 2026-01-19 10:29:24 -08:00
ace.js feat: initial commit 2025-06-29 15:51:08 -07:00
adonisrc.ts feat(GPU): auto-remediate nomad_ollama passthrough loss on admin boot (#755) 2026-05-20 10:16:00 -07:00
eslint.config.js feat: openwebui+ollama and zim management 2025-07-09 09:08:21 -07:00
package-lock.json build(deps): bump picomatch in /admin 2026-05-20 10:16:00 -07:00
package.json chore(deps): pin all deps to exact versions 2026-05-20 10:16:00 -07:00
tailwind.config.ts feat: initial commit 2025-06-29 15:51:08 -07:00
tsconfig.json feat: initial commit 2025-06-29 15:51:08 -07:00
vite.config.ts fix(Maps): ensure proper parsing of hostnames (#640) 2026-04-03 14:26:50 -07:00