Closes the Manual-mode UX dead-end: after toggling 'Auto-index new content
for AI?' to Manual, a freshly-downloaded ZIM (or any pending_decision file)
had no UI path to opt in for embedding short of the global Sync Storage /
Re-embed All bulk actions. Per RFC #883 §5, each Stored Files row now
carries a state pill and an adaptive single-button action.
State pill (left of any existing warning chips):
- 'Indexed' — green; row had chunks in Qdrant or state row is 'indexed'
- 'Not Indexed' — neutral; state is pending_decision or browse_only
- 'Failed' — red
- 'Stalled' — amber
- admin_docs collapsed row has no pill ('Managed by NOMAD' carries it)
Adaptive action button (paired with the existing Delete button per row):
- pending_decision → 'Index' (force=false)
- browse_only → 'Index' (force=true)
- failed / stalled → 'Retry' (force=true)
- indexed + warning chip → 'Re-embed' (force=true; confirm modal first)
- indexed healthy / null → no action button (bulk Re-embed All covers it)
Backend: GET /api/rag/files now returns
{ files: Array<{ source, state, chunksEmbedded }> }
instead of a flat string[]. State + chunk-count come from a single
KbIngestState query unioned into the existing Qdrant-derived source list
(no new round trips). New POST /api/rag/files/embed validates the source is
known, refuses if any inflight job already targets the same filePath
(prevents double-click duplicate-chunk hazard), pre-deletes Qdrant points
when force=true, then dispatches via the existing _dispatchEmbedJobsFor
helper used by reembedAll.
Per-file Re-embed (force=true on an already-indexed file) routes through a
StyledModal confirmation since it deletes existing vectors before queueing
a fresh job — same destructive-action weight as Delete's inline confirm but
heavier since it affects search until the rebuild finishes.
Folds in PR #907's blank-screen fix because my new render needs the same
generic restored: `<StyledTable<KbFileGroup>>` and `record.displayName`
(instead of the unresolved `sourceToDisplayName(record.source)` that ships
in rc.5 and ReferenceErrors on modal open). PR #907 also adds title
tooltips on the three bulk-action buttons; those tooltips are NOT included
here — let PR #907 land first or independently for that part.
Multi-select bulk-opt-in deferred per discussion: most Manual-mode users
ingest 1-2 files at a time, the existing global toggle covers the bulk
case, and checkboxes would expand scope past what rc.6 should hold. Will
file a follow-up issue for an 'Index N pending files' single-click button
once this lands.
Tests-in-PR scope was limited to keeping `kb_file_grouping.spec.ts` green
after the StoredFileInfo[] signature change (added asInfos() wrapper).
Dedicated unit tests for embedSingleFile (unknown source / inflight refused
/ force=true delete-then-dispatch) and the new state-pill rendering will
land in a follow-up PR alongside Playwright coverage of the row actions.
Verification path: NOMAD3 currently runs project-nomad-admin:integration-
rc6-preview (PRs #907 + #908 atop rc.5). After this branch is built into a
new integration tag, I'll re-run targeted Playwright UAT on the KB modal
covering: state pill rendering per state, Index click on pending_decision
opts in cleanly, Retry on failed re-dispatches successfully, Re-embed
confirmation modal copy + delete-then-dispatch on the military-medicine
partial-stall row, and Delete flow untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a user picks a tier in TierSelectionModal, show how much additional
disk space the AI Assistant will need if the new ZIMs are indexed, plus
a policy-aware footer explaining whether they'll auto-index (Always) or
wait for opt-in (Manual). Estimates consume #891's KbRatioRegistry via a
new POST /api/rag/estimate-batch endpoint.
Backend
- New POST /api/rag/estimate-batch route + RagController.estimateBatch
- VineJS schema accepting array of {filename, sizeBytes}, capped at 500
- KbRatioRegistry.estimateBatch aggregates via the existing prefix-match
lookup, returns {totalChunks, totalBytes, hasUnknown}
- New BYTES_PER_CHUNK_ON_DISK constant (~8 KB: 3 KB vector + ~3 KB chunk
text + ~2 KB payload/index overhead). Tunable; will be replaced by
Phase 4 self-calibration once we have real measurements.
- Controller normalizes incoming filenames via path.basename so callers
that send full paths or URLs still match registry prefixes correctly.
Frontend
- api.estimateEmbeddingBatch() client method
- TierSelectionModal: when localSelectedSlug is set, resolve the tier's
resources (incl. inherited tiers), POST to /estimate-batch, and render
a new info block with the +~X GB figure + ingest-policy copy. Also
fetches rag.defaultIngestPolicy so the same block surfaces whether
indexing will fire automatically or wait for the user.
- resourceFilename() helper extracts the basename from the resource URL
so the registry lookup hits the right prefix regardless of mirror.
Tests
- 4 new cases in tests/unit/kb_ratio_lookup.spec.ts covering the
estimateBatch aggregator: standard sum, unknown-flagging, video-only
ZIM (0 chunks but known, hasUnknown stays false), empty input.
Stacks on feat/kb-ratio-registry (#891) — consumes the registry table
seeded by that PR. Once #891 merges to rc, this PR auto-rebases.
Out of scope for this PR (deferred to follow-ups):
- Per-batch opt-in checkbox (RFC §1's '☑ Also index these for AI') needs
a per-batch policy override path and is a separate PR
- Guardrail modal at 50 GB / 10% free / 6 hr thresholds (RFC §7) is also
separate; this PR is informational, not gating
- Time-to-embed estimate awaits a chunks-per-second metric per host