mirror of
https://github.com/Crosstalk-Solutions/project-nomad.git
synced 2026-05-24 05:15:05 +02:00
Two related fixes to make the System Information page reliably show real GPU info instead of misleading lspci BAR0 readings or N/A. 1. Generalize bogus-VRAM detection to AMD. Same root cause as #835 (NVIDIA showing 32 MB), this time for AMD: lspci parses the first PCI memory Region (BAR0, typically 1-16 MiB on Navi cards) as `vram`. On NOMAD8 (Threadripper 3960X + Radeon RX 6800), the System Information page showed "1 MB" instead of "16 GB". PR #850 fixed this for NVIDIA by clearing the bogus value and re-running the Ollama log probe; the check was vendor-gated to NVIDIA only. `isBogusNvidiaVram` becomes `isBogusDgpuVram` with a `isDiscreteGpuVendor` helper matching /nvidia|advanced micro devices|amd|ati/i. Same 256-MiB threshold — no real discrete GPU has less than that, while Intel iGPUs (which legitimately report small shared-memory VRAM via lspci) are left untouched. The probe gate condition is similarly renamed. 2. Read Ollama logs from the startup window, not tail:N. `getOllamaInferenceComputeFromLogs()` was reading the last 500 log lines and grepping for the "inference compute" line. That line is written once during Ollama's GPU discovery phase within seconds of startup. Under active embedding workloads we measured >1000 log lines/min, which pushes the line past any reasonable tail within minutes — at which point the probe returns null and the UI flips to "GPU Not Accessible" even though Ollama is happily using the GPU (size_vram > 0 in /api/ps). Switch from `tail: 500` to `since: containerStartedAt, until: containerStartedAt + 300s`. The 5-minute window is bounded regardless of container uptime and always captures Ollama's GPU discovery output. The inference-compute line is emitted in the first few seconds of startup, so 5 min is generous headroom. Validated on NOMAD8 (RX 6800, container uptime ~10 min with sustained ingestion that generated 6,345 log lines): Before: controllers[0]: { model: "Navi 21 ...", vram: 1 } After (bogus AMD VRAM cleared, log probe stale due to tail:500 churn): controllers[0]: { model: "Navi 21 ...", vram: null } gpuHealth: { status: "passthrough_failed" } -> UI shows "N/A" and the banner from PR #208 After (bogus cleared + log probe reads startup window): controllers[0]: { model: "AMD Radeon RX 6800", vram: 16384 } gpuHealth: { status: "ok", hasRocmRuntime: true, ollamaGpuAccessible: true } -> UI shows "16 GB", no banner Both branches of the fix exercise correctly: NVIDIA path unchanged (same code, just renamed identifiers), AMD path now triggers the probe and the probe reliably finds the GPU info regardless of container age. |
||
|---|---|---|
| .. | ||
| app | ||
| bin | ||
| commands | ||
| config | ||
| constants | ||
| database | ||
| docs | ||
| inertia | ||
| providers | ||
| public | ||
| resources | ||
| start | ||
| tests | ||
| types | ||
| util | ||
| views | ||
| .editorconfig | ||
| .env.example | ||
| ace.js | ||
| adonisrc.ts | ||
| eslint.config.js | ||
| package-lock.json | ||
| package.json | ||
| tailwind.config.ts | ||
| tsconfig.json | ||
| vite.config.ts | ||