project-nomad/admin
Chris Sherwood d28eb9be59 fix(RAG): report ZIM ingestion progress in overall-file frame
Before this change, the Active Downloads / Processing Queue UI showed the
ingestion progress gauge jumping wildly during multi-batch ZIM ingestion
(e.g. 5% → 88% → 27% → 5% → 56% → 36% over ~60 seconds for cooking SE).

Each continuation batch is a separate BullMQ job, and `EmbedFileJob.handle()`
reported `job.progress` in two different reference frames depending on
where it was in the batch lifecycle:

  - During-batch (via the onProgress callback): 5% → 95% scaled across
    "% through this batch's chunks"
  - End-of-batch (just before dispatching the next): overwritten to
    `(nextOffset / totalArticles) * 100` — % through the whole file
  - Next continuation batch starts with progress = 5% explicitly, then
    climbs through the per-batch range again

`listActiveJobs()` returns the latest active BullMQ job's progress. With
GPU-accelerated ingestion completing a batch every ~4 seconds, the UI
saw the jobId rotate constantly and the gauge whipsaw between the two
reference frames.

`totalArticles` was already wired through the EmbedFileJob params shape
and used end-of-batch — but RagService never actually populated it,
so any frame-scaling that depended on it silently fell back to the
per-batch range. Two fixes together:

1. `ZIMExtractionService.extractZIMContent()` now returns
   `{ chunks: ZIMContentChunk[]; totalArticles: number }` instead of a
   raw chunks array, surfacing `archive.articleCount` to the caller.
   Single caller (rag_service) updated to destructure.

2. `RagService.processZimFile()` includes `totalArticles` in its result
   so `EmbedFileJob.dispatch()` can propagate it to the continuation
   batch (which the existing code already does via
   `totalArticles: totalArticles || result.totalArticles`).

3. `EmbedFileJob`'s onProgress callback scales the service-reported
   per-batch percent into the overall-file frame when `totalArticles`
   is known: `((batchOffset + (percent/100) * ZIM_BATCH_SIZE) /
   totalArticles) * 100`. Capped at 99% to leave room for the explicit
   100% set at file completion. Falls back to the original 5-95% range
   for single-batch files (uploaded PDFs/txts) where totalArticles is
   undefined — the gauge then represents % through the only batch,
   which is what the UI expects for one-shot files.

Validated on NOMAD8 (RX 6800, ROCm-accelerated nomic):

  - devdocs python (small, ~1500 articles): batch progressions seen
    monotonically across continuation jobIds:
    1501@30% → 1510@33% → 1514@43% → 1518@52%.
  - ifixit (huge, ~100k articles): stays near 3% for the first many
    batches at offset 0..3000 — correct, the file is enormous.
  - wikipedia_en_medicine (large, ~70k articles): stays near 0-1% for
    the first batches — also correct.
  - Brief 0-5% blip on continuation handoff (the explicit
    `safeUpdateProgress(job, 5)` at batch start, before the first
    onProgress callback fires) — visible but quickly resolves to the
    overall-frame value. No more 5% ↔ 88% chaos.
2026-05-20 10:16:00 -07:00
..
app fix(RAG): report ZIM ingestion progress in overall-file frame 2026-05-20 10:16:00 -07:00
bin feat: curated content system overhaul 2026-02-11 15:44:46 -08:00
commands feat(Maps): regional map downloads via go-pmtiles extract (#780) 2026-05-20 10:16:00 -07:00
config fix: cache docker list requests, aiAssistantName fetching, and ensure inertia used properly 2026-04-03 14:26:50 -07:00
constants feat(Maps): regional map downloads via go-pmtiles extract (#780) 2026-05-20 10:16:00 -07:00
database feat(Content): custom ZIM library sources with pre-seeded mirrors (#593) 2026-05-20 10:16:00 -07:00
docs docs: update release notes 2026-05-20 10:16:00 -07:00
inertia fix(Maps): render notes in marker popup when populated 2026-05-20 10:16:00 -07:00
providers feat(GPU): auto-remediate nomad_ollama passthrough loss on admin boot (#755) 2026-05-20 10:16:00 -07:00
public feat: switch all PNG images to WEBP (#575) 2026-04-03 14:26:50 -07:00
resources feat(Maps): regional map downloads via go-pmtiles extract (#780) 2026-05-20 10:16:00 -07:00
start feat(Content): custom ZIM library sources with pre-seeded mirrors (#593) 2026-05-20 10:16:00 -07:00
tests fix(UI): improve global map banner display logic (#702) 2026-05-20 10:16:00 -07:00
types feat(GPU): auto-remediate nomad_ollama passthrough loss on admin boot (#755) 2026-05-20 10:16:00 -07:00
util feat: display model download progress 2026-02-06 16:22:23 -08:00
views feat: initial commit 2025-06-29 15:51:08 -07:00
.editorconfig feat: initial commit 2025-06-29 15:51:08 -07:00
.env.example feat: Add Windows Docker Desktop support for local development 2026-01-19 10:29:24 -08:00
ace.js feat: initial commit 2025-06-29 15:51:08 -07:00
adonisrc.ts feat(GPU): auto-remediate nomad_ollama passthrough loss on admin boot (#755) 2026-05-20 10:16:00 -07:00
eslint.config.js feat: openwebui+ollama and zim management 2025-07-09 09:08:21 -07:00
package-lock.json build(deps): bump picomatch in /admin 2026-05-20 10:16:00 -07:00
package.json chore(deps): pin all deps to exact versions 2026-05-20 10:16:00 -07:00
tailwind.config.ts feat: initial commit 2025-06-29 15:51:08 -07:00
tsconfig.json feat: initial commit 2025-06-29 15:51:08 -07:00
vite.config.ts fix(Maps): ensure proper parsing of hostnames (#640) 2026-04-03 14:26:50 -07:00