fix(System): correct AMD VRAM in Graphics card + harden log probe

Two related fixes to make the System Information page reliably show real
GPU info instead of misleading lspci BAR0 readings or N/A.

1. Generalize bogus-VRAM detection to AMD.

Same root cause as #835 (NVIDIA showing 32 MB), this time for AMD: lspci
parses the first PCI memory Region (BAR0, typically 1-16 MiB on Navi
cards) as `vram`. On NOMAD8 (Threadripper 3960X + Radeon RX 6800), the
System Information page showed "1 MB" instead of "16 GB". PR #850 fixed
this for NVIDIA by clearing the bogus value and re-running the Ollama
log probe; the check was vendor-gated to NVIDIA only.

`isBogusNvidiaVram` becomes `isBogusDgpuVram` with a `isDiscreteGpuVendor`
helper matching /nvidia|advanced micro devices|amd|ati/i. Same 256-MiB
threshold — no real discrete GPU has less than that, while Intel iGPUs
(which legitimately report small shared-memory VRAM via lspci) are left
untouched. The probe gate condition is similarly renamed.

2. Read Ollama logs from the startup window, not tail:N.

`getOllamaInferenceComputeFromLogs()` was reading the last 500 log lines
and grepping for the "inference compute" line. That line is written once
during Ollama's GPU discovery phase within seconds of startup. Under
active embedding workloads we measured >1000 log lines/min, which pushes
the line past any reasonable tail within minutes — at which point the
probe returns null and the UI flips to "GPU Not Accessible" even though
Ollama is happily using the GPU (size_vram > 0 in /api/ps).

Switch from `tail: 500` to `since: containerStartedAt, until:
containerStartedAt + 300s`. The 5-minute window is bounded regardless of
container uptime and always captures Ollama's GPU discovery output. The
inference-compute line is emitted in the first few seconds of startup, so
5 min is generous headroom.

Validated on NOMAD8 (RX 6800, container uptime ~10 min with sustained
ingestion that generated 6,345 log lines):

Before:
  controllers[0]: { model: "Navi 21 ...", vram: 1 }

After (bogus AMD VRAM cleared, log probe stale due to tail:500 churn):
  controllers[0]: { model: "Navi 21 ...", vram: null }
  gpuHealth: { status: "passthrough_failed" }
  -> UI shows "N/A" and the banner from PR #208

After (bogus cleared + log probe reads startup window):
  controllers[0]: { model: "AMD Radeon RX 6800", vram: 16384 }
  gpuHealth: { status: "ok", hasRocmRuntime: true, ollamaGpuAccessible: true }
  -> UI shows "16 GB", no banner

Both branches of the fix exercise correctly: NVIDIA path unchanged
(same code, just renamed identifiers), AMD path now triggers the probe
and the probe reliably finds the GPU info regardless of container age.
This commit is contained in:
Chris Sherwood 2026-05-13 12:23:10 -07:00 committed by Jake Turner
parent 501860a23b
commit d2f2172b3c

View File

@ -95,10 +95,23 @@ export class SystemService {
if (!ollamaContainer) return null
const container = this.dockerService.docker.getContainer(ollamaContainer.Id)
// Read logs only from the first 5 minutes after container start. The
// "inference compute" line is written once during Ollama's GPU discovery
// phase, within seconds of startup. Using tail:N here is fragile: under
// active embedding workloads we've seen >1000 lines/min, which pushes the
// line past any reasonable tail in minutes. Pinning to the startup window
// is bounded (~5 min of logs regardless of container uptime) and never
// ages out.
const inspect = await container.inspect()
const startedAtMs = new Date(inspect.State.StartedAt).getTime()
const startedAtSec = Math.floor(startedAtMs / 1000)
const startupWindowSec = startedAtSec + 300 // 5-minute window
const buf = (await container.logs({
stdout: true,
stderr: true,
tail: 500,
since: startedAtSec,
until: startupWindowSec,
follow: false,
})) as unknown as Buffer
const logs = buf.toString('utf8')
@ -400,36 +413,40 @@ export class SystemService {
}
// si.graphics() in the admin container uses lspci (pciutils ships in
// the image for AMD detection). lspci has no real VRAM info for NVIDIA
// cards, so systeminformation parses the first PCI memory Region (BAR0,
// 16-32 MiB on most NVIDIA cards) as `vram`. nvidia-smi enrichment also
// can't run since the binary isn't in the admin image. No real dGPU
// has under 256 MiB, so any NVIDIA controller below that needs the
// probes below to give us real data.
const NVIDIA_BOGUS_VRAM_THRESHOLD_MIB = 256
const isBogusNvidiaVram = (c: { vendor?: string; vram?: number | null }) =>
/nvidia/i.test(c.vendor || '') &&
// the image for AMD detection). lspci has no real VRAM info for
// discrete GPUs, so systeminformation parses the first PCI memory
// Region (BAR0, typically 1-32 MiB) as `vram`. nvidia-smi / ROCm
// tooling enrichment also can't run since neither is in the admin
// image. No real dGPU has under 256 MiB, so any discrete-GPU controller
// below that threshold needs the probes below to give us real data.
// Applies to both NVIDIA and AMD; Intel iGPUs are exempt because their
// shared-system-memory VRAM reading via lspci can legitimately be small.
const DGPU_BOGUS_VRAM_THRESHOLD_MIB = 256
const isDiscreteGpuVendor = (vendor: string) =>
/nvidia|advanced micro devices|amd|ati/i.test(vendor)
const isBogusDgpuVram = (c: { vendor?: string; vram?: number | null }) =>
isDiscreteGpuVendor(c.vendor || '') &&
typeof c.vram === 'number' &&
c.vram < NVIDIA_BOGUS_VRAM_THRESHOLD_MIB
c.vram < DGPU_BOGUS_VRAM_THRESHOLD_MIB
// Clear the bogus value up front. If a probe replaces the entry below
// we get the real VRAM; if no probe succeeds (Ollama not installed,
// passthrough_failed) the UI falls back to "N/A" instead of showing
// "32 MB". The lspci model/vendor strings stay since they're still
// useful for identifying the card.
const hasLspciBogusNvidiaVram = (graphics.controllers || []).some(isBogusNvidiaVram)
if (hasLspciBogusNvidiaVram) {
// "1 MB" / "32 MB". The lspci model/vendor strings stay since they're
// still useful for identifying the card.
const hasLspciBogusDgpuVram = (graphics.controllers || []).some(isBogusDgpuVram)
if (hasLspciBogusDgpuVram) {
for (const c of graphics.controllers) {
if (isBogusNvidiaVram(c)) c.vram = null
if (isBogusDgpuVram(c)) c.vram = null
}
}
// Run the probes when controllers are empty (common inside Docker) or
// when lspci gave us bogus NVIDIA BAR0 values that need replacing.
// when lspci gave us bogus discrete-GPU BAR0 values that need replacing.
if (
!graphics.controllers ||
graphics.controllers.length === 0 ||
hasLspciBogusNvidiaVram
hasLspciBogusDgpuVram
) {
const runtimes = dockerInfo.Runtimes || {}
gpuHealth.hasNvidiaRuntime = 'nvidia' in runtimes