mirror of
https://github.com/Crosstalk-Solutions/project-nomad.git
synced 2026-06-03 18:16:49 +02:00
fix(System): correct AMD VRAM in Graphics card + harden log probe
Two related fixes to make the System Information page reliably show real GPU info instead of misleading lspci BAR0 readings or N/A. 1. Generalize bogus-VRAM detection to AMD. Same root cause as #835 (NVIDIA showing 32 MB), this time for AMD: lspci parses the first PCI memory Region (BAR0, typically 1-16 MiB on Navi cards) as `vram`. On NOMAD8 (Threadripper 3960X + Radeon RX 6800), the System Information page showed "1 MB" instead of "16 GB". PR #850 fixed this for NVIDIA by clearing the bogus value and re-running the Ollama log probe; the check was vendor-gated to NVIDIA only. `isBogusNvidiaVram` becomes `isBogusDgpuVram` with a `isDiscreteGpuVendor` helper matching /nvidia|advanced micro devices|amd|ati/i. Same 256-MiB threshold — no real discrete GPU has less than that, while Intel iGPUs (which legitimately report small shared-memory VRAM via lspci) are left untouched. The probe gate condition is similarly renamed. 2. Read Ollama logs from the startup window, not tail:N. `getOllamaInferenceComputeFromLogs()` was reading the last 500 log lines and grepping for the "inference compute" line. That line is written once during Ollama's GPU discovery phase within seconds of startup. Under active embedding workloads we measured >1000 log lines/min, which pushes the line past any reasonable tail within minutes — at which point the probe returns null and the UI flips to "GPU Not Accessible" even though Ollama is happily using the GPU (size_vram > 0 in /api/ps). Switch from `tail: 500` to `since: containerStartedAt, until: containerStartedAt + 300s`. The 5-minute window is bounded regardless of container uptime and always captures Ollama's GPU discovery output. The inference-compute line is emitted in the first few seconds of startup, so 5 min is generous headroom. Validated on NOMAD8 (RX 6800, container uptime ~10 min with sustained ingestion that generated 6,345 log lines): Before: controllers[0]: { model: "Navi 21 ...", vram: 1 } After (bogus AMD VRAM cleared, log probe stale due to tail:500 churn): controllers[0]: { model: "Navi 21 ...", vram: null } gpuHealth: { status: "passthrough_failed" } -> UI shows "N/A" and the banner from PR #208 After (bogus cleared + log probe reads startup window): controllers[0]: { model: "AMD Radeon RX 6800", vram: 16384 } gpuHealth: { status: "ok", hasRocmRuntime: true, ollamaGpuAccessible: true } -> UI shows "16 GB", no banner Both branches of the fix exercise correctly: NVIDIA path unchanged (same code, just renamed identifiers), AMD path now triggers the probe and the probe reliably finds the GPU info regardless of container age.
This commit is contained in:
parent
501860a23b
commit
d2f2172b3c
|
|
@ -95,10 +95,23 @@ export class SystemService {
|
|||
if (!ollamaContainer) return null
|
||||
|
||||
const container = this.dockerService.docker.getContainer(ollamaContainer.Id)
|
||||
|
||||
// Read logs only from the first 5 minutes after container start. The
|
||||
// "inference compute" line is written once during Ollama's GPU discovery
|
||||
// phase, within seconds of startup. Using tail:N here is fragile: under
|
||||
// active embedding workloads we've seen >1000 lines/min, which pushes the
|
||||
// line past any reasonable tail in minutes. Pinning to the startup window
|
||||
// is bounded (~5 min of logs regardless of container uptime) and never
|
||||
// ages out.
|
||||
const inspect = await container.inspect()
|
||||
const startedAtMs = new Date(inspect.State.StartedAt).getTime()
|
||||
const startedAtSec = Math.floor(startedAtMs / 1000)
|
||||
const startupWindowSec = startedAtSec + 300 // 5-minute window
|
||||
const buf = (await container.logs({
|
||||
stdout: true,
|
||||
stderr: true,
|
||||
tail: 500,
|
||||
since: startedAtSec,
|
||||
until: startupWindowSec,
|
||||
follow: false,
|
||||
})) as unknown as Buffer
|
||||
const logs = buf.toString('utf8')
|
||||
|
|
@ -400,36 +413,40 @@ export class SystemService {
|
|||
}
|
||||
|
||||
// si.graphics() in the admin container uses lspci (pciutils ships in
|
||||
// the image for AMD detection). lspci has no real VRAM info for NVIDIA
|
||||
// cards, so systeminformation parses the first PCI memory Region (BAR0,
|
||||
// 16-32 MiB on most NVIDIA cards) as `vram`. nvidia-smi enrichment also
|
||||
// can't run since the binary isn't in the admin image. No real dGPU
|
||||
// has under 256 MiB, so any NVIDIA controller below that needs the
|
||||
// probes below to give us real data.
|
||||
const NVIDIA_BOGUS_VRAM_THRESHOLD_MIB = 256
|
||||
const isBogusNvidiaVram = (c: { vendor?: string; vram?: number | null }) =>
|
||||
/nvidia/i.test(c.vendor || '') &&
|
||||
// the image for AMD detection). lspci has no real VRAM info for
|
||||
// discrete GPUs, so systeminformation parses the first PCI memory
|
||||
// Region (BAR0, typically 1-32 MiB) as `vram`. nvidia-smi / ROCm
|
||||
// tooling enrichment also can't run since neither is in the admin
|
||||
// image. No real dGPU has under 256 MiB, so any discrete-GPU controller
|
||||
// below that threshold needs the probes below to give us real data.
|
||||
// Applies to both NVIDIA and AMD; Intel iGPUs are exempt because their
|
||||
// shared-system-memory VRAM reading via lspci can legitimately be small.
|
||||
const DGPU_BOGUS_VRAM_THRESHOLD_MIB = 256
|
||||
const isDiscreteGpuVendor = (vendor: string) =>
|
||||
/nvidia|advanced micro devices|amd|ati/i.test(vendor)
|
||||
const isBogusDgpuVram = (c: { vendor?: string; vram?: number | null }) =>
|
||||
isDiscreteGpuVendor(c.vendor || '') &&
|
||||
typeof c.vram === 'number' &&
|
||||
c.vram < NVIDIA_BOGUS_VRAM_THRESHOLD_MIB
|
||||
c.vram < DGPU_BOGUS_VRAM_THRESHOLD_MIB
|
||||
|
||||
// Clear the bogus value up front. If a probe replaces the entry below
|
||||
// we get the real VRAM; if no probe succeeds (Ollama not installed,
|
||||
// passthrough_failed) the UI falls back to "N/A" instead of showing
|
||||
// "32 MB". The lspci model/vendor strings stay since they're still
|
||||
// useful for identifying the card.
|
||||
const hasLspciBogusNvidiaVram = (graphics.controllers || []).some(isBogusNvidiaVram)
|
||||
if (hasLspciBogusNvidiaVram) {
|
||||
// "1 MB" / "32 MB". The lspci model/vendor strings stay since they're
|
||||
// still useful for identifying the card.
|
||||
const hasLspciBogusDgpuVram = (graphics.controllers || []).some(isBogusDgpuVram)
|
||||
if (hasLspciBogusDgpuVram) {
|
||||
for (const c of graphics.controllers) {
|
||||
if (isBogusNvidiaVram(c)) c.vram = null
|
||||
if (isBogusDgpuVram(c)) c.vram = null
|
||||
}
|
||||
}
|
||||
|
||||
// Run the probes when controllers are empty (common inside Docker) or
|
||||
// when lspci gave us bogus NVIDIA BAR0 values that need replacing.
|
||||
// when lspci gave us bogus discrete-GPU BAR0 values that need replacing.
|
||||
if (
|
||||
!graphics.controllers ||
|
||||
graphics.controllers.length === 0 ||
|
||||
hasLspciBogusNvidiaVram
|
||||
hasLspciBogusDgpuVram
|
||||
) {
|
||||
const runtimes = dockerInfo.Runtimes || {}
|
||||
gpuHealth.hasNvidiaRuntime = 'nvidia' in runtimes
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user