mirror of
https://github.com/Crosstalk-Solutions/project-nomad.git
synced 2026-05-30 16:16:50 +02:00
EmbedFileJob.dispatch() uses a deterministic per-file jobId
(sha256(filePath).slice(0,16)) for every batch. The parent batch's
handle() calls EmbedFileJob.dispatch({ batchOffset }) before returning,
so the parent is still in `active` state and locked when the
continuation tries to enqueue. BullMQ silently returns the locked
parent instead of creating a new job — and in newer BullMQ versions
it does so without throwing, so the existing
`catch (error.message.includes('job already exists'))` branch never
fires. After the parent completes, its entry stays in the `completed`
ZSET (held by `removeOnComplete: { count: 50 }`), continuing to trip
jobId dedupe for any subsequent re-dispatch attempts.
Result: every NOMAD install since 2026-02-08 (feat: zim content
embedding) with a multi-batch ZIM (wikipedia, cooking SE, ifixit,
lrnselfreliance, etc.) has only the first 50 articles indexed in
qdrant. The RAG feature has been silently degraded for ~3 months —
the user sees the file appear in their KB, qdrant accumulates ~50
articles' worth of vectors, and pagination quietly halts. No error
surfaces anywhere.
Fix: dispatch() skips the deterministic jobId for continuation batches
(batchOffset > 0), letting BullMQ auto-generate a unique one so each
batch stacks as an independent queue entry. Initial dispatches keep
the deterministic jobId so re-triggering an install (UI re-click,
sync rescan) remains idempotent. The existing 'job already exists'
branch is now gated on !isContinuation, since by construction
continuation batches will never hit dedupe.
Validated on NOMAD8 (RX 6800 / Threadripper 3960X, rc.3 + this patch):
devdocs_en_python (~1,500 chunks across multiple batches) correctly
paginates end-to-end. admin.log shows the expected sequence of
"Dispatched embedding job for file: X (continuation @ offset N)"
followed by "Starting embedding process for: X (batch offset: N)"
for each batch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| controllers | ||
| exceptions | ||
| jobs | ||
| middleware | ||
| models | ||
| services | ||
| utils | ||
| validators | ||