fix(KB): union Stored Files list with state-machine file paths (#898)

Closes the 'zero_chunks warning has no row to attach to' gap surfaced by
the 2026-05-14 integration UAT. Before this fix RagService.getStoredFiles
returned only file paths that appeared in Qdrant's payload.source — so
files with 0 embedded chunks (video-only ZIMs, browse_only opt-outs,
ingestions that failed before producing any chunks) silently disappeared
from the KB panel's Stored Files table.

The fix unions the Qdrant scroll result with the disk-backed file paths
recorded in kb_ingest_state. Effect:
- lrnselfreliance_en_all_2025-12.zim (3.97 GB video-only ZIM, 0 chunks)
  now appears in the table, picks up its zero_chunks warning chip
- Files in pending_decision under Manual policy show up so the user can
  see what's waiting for opt-in
- Files in browse_only / failed states have a row for future per-card
  Retry / Re-index actions (forthcoming, blocked on #886)

The state-machine query is wrapped in its own try/catch so a transient
DB error degrades to the Qdrant-only list rather than 500-ing the whole
panel — same defensive posture as the outer try/catch.

Stacks on feat/kb-ingest-state-machine (#888) because the union depends
on the kb_ingest_state table that PR introduces. Will rebase to rc once
#888 merges. Completes the second half of #895's warning surface; the
first half (partial_stall) already worked because those files have at
least some chunks in Qdrant.
This commit is contained in:
chriscrosstalk 2026-05-16 22:44:30 -07:00 committed by Jake Turner
parent 4e8caddcc2
commit 8ed0bdfd8f

View File

@ -1082,6 +1082,28 @@ export class RagService {
offset = scrollResult.next_page_offset || null
} while (offset !== null)
// Union the Qdrant-derived list with the disk-backed file paths the
// state machine has tracked. Without this, files known to the scanner
// but with zero embedded chunks (video-only ZIMs, failed-before-first-
// chunk ingestions, browse_only opt-outs) never get a row in Stored
// Files — which means warnings keyed off those files (#895 zero_chunks
// in particular) have no row to attach to. The state machine is the
// authoritative "what's on disk?" view; Qdrant is "what made it into
// the vector store?". Both are needed to render the KB UI honestly.
try {
const stateRows = await KbIngestState.query().select('file_path')
for (const row of stateRows) {
sources.add(row.file_path)
}
} catch (error) {
// Non-fatal: if the state machine query fails for any reason we'd
// rather return the Qdrant-derived list than 500 the whole panel.
logger.warn(
{ err: error },
'[RagService.getStoredFiles] state-machine union skipped; returning Qdrant-only list'
)
}
return Array.from(sources)
} catch (error) {
logger.error('Error retrieving stored files:', error)