project-nomad/admin/app
Chris Sherwood 7acc53444d fix(RAG): unbreak multi-batch ZIM ingestion (jobId dedupe)
EmbedFileJob.dispatch() uses a deterministic per-file jobId
(sha256(filePath).slice(0,16)) for every batch. The parent batch's
handle() calls EmbedFileJob.dispatch({ batchOffset }) before returning,
so the parent is still in `active` state and locked when the
continuation tries to enqueue. BullMQ silently returns the locked
parent instead of creating a new job — and in newer BullMQ versions
it does so without throwing, so the existing
`catch (error.message.includes('job already exists'))` branch never
fires. After the parent completes, its entry stays in the `completed`
ZSET (held by `removeOnComplete: { count: 50 }`), continuing to trip
jobId dedupe for any subsequent re-dispatch attempts.

Result: every NOMAD install since 2026-02-08 (feat: zim content
embedding) with a multi-batch ZIM (wikipedia, cooking SE, ifixit,
lrnselfreliance, etc.) has only the first 50 articles indexed in
qdrant. The RAG feature has been silently degraded for ~3 months —
the user sees the file appear in their KB, qdrant accumulates ~50
articles' worth of vectors, and pagination quietly halts. No error
surfaces anywhere.

Fix: dispatch() skips the deterministic jobId for continuation batches
(batchOffset > 0), letting BullMQ auto-generate a unique one so each
batch stacks as an independent queue entry. Initial dispatches keep
the deterministic jobId so re-triggering an install (UI re-click,
sync rescan) remains idempotent. The existing 'job already exists'
branch is now gated on !isContinuation, since by construction
continuation batches will never hit dedupe.

Validated on NOMAD8 (RX 6800 / Threadripper 3960X, rc.3 + this patch):
devdocs_en_python (~1,500 chunks across multiple batches) correctly
paginates end-to-end. admin.log shows the expected sequence of
"Dispatched embedding job for file: X (continuation @ offset N)"
followed by "Starting embedding process for: X (batch offset: N)"
for each batch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:38:23 -07:00
..
controllers fix(AI): rewrite RAG query on first follow-up (off-by-one in skip-rewrite threshold) 2026-05-12 20:34:30 -07:00
exceptions fix(Docs): documentation renderer fixes 2025-12-23 16:00:33 -08:00
jobs fix(RAG): unbreak multi-batch ZIM ingestion (jobId dedupe) 2026-05-12 20:38:23 -07:00
middleware fix(API): skip compression for Server-Sent Events (#798) 2026-04-27 19:00:31 -07:00
models feat(Content): custom ZIM library sources with pre-seeded mirrors (#593) 2026-05-04 11:30:59 -07:00
services fix(AI): preserve semver tag in DB on AMD Ollama updates 2026-05-11 21:08:08 -07:00
utils fix(Downloads): treat missing Content-Type as octet-stream (#848) 2026-05-11 21:09:40 -07:00
validators feat(Content): custom ZIM library sources with pre-seeded mirrors (#593) 2026-05-04 11:30:59 -07:00