project-nomad/admin/app/models
chriscrosstalk 743549ca74 feat(KB): per-file ingest state machine (Phase 1 of RFC #883) (#888)
Adds a persistent state machine for AI knowledge-base ingestion so the
scanner can distinguish "fully indexed", "user opted out", "failed", and
"stalled" from each other — none of which were derivable from the prior
binary "any chunks in Qdrant ⇒ embedded" check.

## What lands

- New table `kb_ingest_state` keyed by `file_path` with enum state column
  (`pending_decision | indexed | browse_only | failed | stalled`).
  Independent of `installed_resources` so it covers both curated downloads
  and manually-uploaded KB files.
- New KV key `rag.defaultIngestPolicy` (string: `Always | Manual`).
  Registered now but not consumed yet — JIT prompt + wizard step land in
  Phase 3 of the RFC.
- `EmbedFileJob.handle` writes state on terminal outcomes:
  - Success (final batch) → `indexed` + chunks count
  - `UnrecoverableError` → `failed` + error message
  - Retryable errors are left to BullMQ's existing retry path
- `scanAndSyncStorage` swaps the binary qdrant check for a state-aware
  decision tree (see `decideScanAction`). Existing installs auto-backfill
  on first scan: files with chunks in Qdrant but no state row become
  `indexed`; new files start as `pending_decision`.
- `deleteFileBySource` drops the state row last, so removed files
  disappear entirely instead of leaving an orphan that the next scan
  would re-dispatch into nothing.

## What does NOT land here

- Ratio registry (separate PR) — needed for partial-stall detection and
  cost estimates, but a separable concern.
- #880 follow-up initial-progress anchor (separate tiny PR).
- Phase 2 UI (status pill, per-card actions, conditional warnings).
- Phase 3 policy surfaces (wizard step, JIT prompt, guardrail modal).
- PR #886's bulk-action hookup — `_deletePointsBySource` / Re-embed All
  / Reset & Rebuild would also want to set state, but #886 isn't merged
  yet; that wiring goes in a follow-up once #886 lands.

## Target

This is forward work for v1.40.0 (RFC #883). Branching off `rc` because
that's the current latest base and post-GA Jake will sync rc→dev; a
retarget at PR-open time is a fast-forward if requested.

## Tests

- 9 new unit tests for `decideScanAction` covering all five states plus
  the no-row / chunks-present / chunks-missing combinations
- Type-check clean
- Smoke-tested end-to-end on NOMAD3 via hot-patch:
  - Backfill: 5 ZIMs + 2 KB uploads with existing chunks in Qdrant all
    came back `indexed` on first scan
  - Pending dispatch: a video-only ZIM with no chunks (`lrnselfreliance`)
    came back `pending_decision` and was correctly re-dispatched (Bull
    deduped to its historical `:completed` jobId — bgauger's #886 fix
    drains that)
  - Delete hook: deleting a KB upload via `DELETE /api/rag/files`
    removed both the disk file and the state row

Co-authored-by: Jake Turner <52841588+jakeaturner@users.noreply.github.com>
2026-05-20 10:16:00 -07:00
..
benchmark_result.ts feat(benchmark): Require full benchmark with AI for community sharing (#99) 2026-01-25 00:24:31 -08:00
benchmark_setting.ts feat: Add system benchmark feature with NOMAD Score 2026-01-22 21:48:12 -08:00
chat_message.ts feat: [wip] native AI chat interface 2026-01-31 20:39:49 -08:00
chat_session.ts feat: [wip] native AI chat interface 2026-01-31 20:39:49 -08:00
collection_manifest.ts feat: curated content system overhaul 2026-02-11 15:44:46 -08:00
custom_library_source.ts feat(Content): custom ZIM library sources with pre-seeded mirrors (#593) 2026-05-20 10:16:00 -07:00
installed_resource.ts feat: curated content system overhaul 2026-02-11 15:44:46 -08:00
kb_ingest_state.ts feat(KB): per-file ingest state machine (Phase 1 of RFC #883) (#888) 2026-05-20 10:16:00 -07:00
kv_store.ts feat(AI Assistant): custom name option for AI Assistant 2026-03-04 20:05:14 -08:00
map_marker.ts feat(maps): add scale bar and location markers (#636) 2026-04-03 14:26:50 -07:00
service.ts feat: support for updating services 2026-03-11 14:08:09 -07:00
wikipedia_selection.ts feat: Add dedicated Wikipedia Selector with smart package management 2026-01-31 21:00:51 -08:00