Commit Graph

13 Commits

Author SHA1 Message Date
jakeaturner
736c9bd672 fix(security): canonicalize hostnames to block IPv4-mapped IPv6 IMDS bypass
Replace literal string matching with ipaddr.js parsing
so equivalent encodings of 169.254.169.254
(::ffff:169.254.169.254, ::ffff:a9fe:a9fe,fully-expanded forms)
and fd00:ec2::254 are all rejected.
2026-05-20 10:16:00 -07:00
Chris Sherwood
d850cb9588 feat(KB): per-file ingest action + state indicator on Stored Files (RFC #883 §5)
Closes the Manual-mode UX dead-end: after toggling 'Auto-index new content
for AI?' to Manual, a freshly-downloaded ZIM (or any pending_decision file)
had no UI path to opt in for embedding short of the global Sync Storage /
Re-embed All bulk actions. Per RFC #883 §5, each Stored Files row now
carries a state pill and an adaptive single-button action.

State pill (left of any existing warning chips):
  - 'Indexed'    — green; row had chunks in Qdrant or state row is 'indexed'
  - 'Not Indexed' — neutral; state is pending_decision or browse_only
  - 'Failed'     — red
  - 'Stalled'    — amber
  - admin_docs collapsed row has no pill ('Managed by NOMAD' carries it)

Adaptive action button (paired with the existing Delete button per row):
  - pending_decision         → 'Index' (force=false)
  - browse_only              → 'Index' (force=true)
  - failed / stalled         → 'Retry' (force=true)
  - indexed + warning chip   → 'Re-embed' (force=true; confirm modal first)
  - indexed healthy / null   → no action button (bulk Re-embed All covers it)

Backend: GET /api/rag/files now returns
  { files: Array<{ source, state, chunksEmbedded }> }
instead of a flat string[]. State + chunk-count come from a single
KbIngestState query unioned into the existing Qdrant-derived source list
(no new round trips). New POST /api/rag/files/embed validates the source is
known, refuses if any inflight job already targets the same filePath
(prevents double-click duplicate-chunk hazard), pre-deletes Qdrant points
when force=true, then dispatches via the existing _dispatchEmbedJobsFor
helper used by reembedAll.

Per-file Re-embed (force=true on an already-indexed file) routes through a
StyledModal confirmation since it deletes existing vectors before queueing
a fresh job — same destructive-action weight as Delete's inline confirm but
heavier since it affects search until the rebuild finishes.

Folds in PR #907's blank-screen fix because my new render needs the same
generic restored: `<StyledTable<KbFileGroup>>` and `record.displayName`
(instead of the unresolved `sourceToDisplayName(record.source)` that ships
in rc.5 and ReferenceErrors on modal open). PR #907 also adds title
tooltips on the three bulk-action buttons; those tooltips are NOT included
here — let PR #907 land first or independently for that part.

Multi-select bulk-opt-in deferred per discussion: most Manual-mode users
ingest 1-2 files at a time, the existing global toggle covers the bulk
case, and checkboxes would expand scope past what rc.6 should hold. Will
file a follow-up issue for an 'Index N pending files' single-click button
once this lands.

Tests-in-PR scope was limited to keeping `kb_file_grouping.spec.ts` green
after the StoredFileInfo[] signature change (added asInfos() wrapper).
Dedicated unit tests for embedSingleFile (unknown source / inflight refused
/ force=true delete-then-dispatch) and the new state-pill rendering will
land in a follow-up PR alongside Playwright coverage of the row actions.

Verification path: NOMAD3 currently runs project-nomad-admin:integration-
rc6-preview (PRs #907 + #908 atop rc.5). After this branch is built into a
new integration tag, I'll re-run targeted Playwright UAT on the KB modal
covering: state pill rendering per state, Index click on pending_decision
opts in cleanly, Retry on failed re-dispatches successfully, Re-embed
confirmation modal copy + delete-then-dispatch on the military-medicine
partial-stall row, and Delete flow untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 10:16:00 -07:00
Chris Sherwood
cf3a924b9f feat(KB): guardrail modal at 50GB / 10%-free thresholds (RFC #883 §7)
One-time confirmation step gating bulk indexing actions that would
consume a substantial amount of disk for embedding storage. Fires only
when the user has policy=Always (i.e., the system would auto-index)
AND the estimate trips either:

  - GUARDRAIL_ABSOLUTE_BYTES = 50 GB embedding cost, OR
  - GUARDRAIL_FREE_DISK_RATIO = 10% of current free disk space

Under policy=Manual the guardrail is silent because the user has
already opted out of automatic ingestion — the files would just queue
as pending_decision either way.

Pieces
- inertia/lib/kb_guardrail.ts: pure decision helper with two constants
  and an evaluateGuardrail() that returns a verdict + reasons. No I/O
  on the helper itself so the logic is trivially testable
- inertia/components/KbGuardrailModal.tsx: confirmation dialog. Headless
  UI Transition + Dialog, amber 'large operation' header, plain-English
  estimate summary, [Cancel] / [Proceed anyway] footer. z-[60] so it
  layers above the tier modal underneath instead of replacing it
- inertia/components/TierSelectionModal.tsx integration: handleSubmit
  now evaluates the guardrail when policy=Always and embedEstimate is
  available; if it trips, we stash the verdict in state and render the
  guardrail modal as an overlay. Confirm runs finalizeSubmit (which is
  the pre-existing onSelectTier + onClose path); Cancel just closes the
  guardrail and leaves the tier modal as-is so the user can change
  their tier choice or flip the policy

The disk-free signal comes from the existing useSystemInfo hook +
getPrimaryDiskInfo helper. Passing freeBytes=0 (unknown) skips the
relative-disk check, so the modal still works on hosts whose disk
introspection failed — just relies on the absolute 50 GB threshold

Tests
- 9 cases in tests/unit/kb_guardrail.spec.ts: standard small batch (no
  trip), exact absolute threshold trips, over-absolute trips, over 10%
  free trips, both-at-once trips with two reasons, freeBytes=0 skip,
  freeBytes=0 + over-absolute trip, exact-10% boundary trips, just-
  under-both safe. All green.

Stacks on feat/kb-tier-estimate-on-disk (#897) — consumes that PR's
estimate endpoint to compute the verdict input. Auto-rebases to rc
when #897 merges.

Pairs with #894 (policy toggle) and #899 (JIT prompt): together the
three PRs cover the 'how do I avoid surprising the user with auto-
indexing they didn't ask for?' arc.

Out of scope (deferred)
- 6 hr time threshold (RFC §7): needs a per-host chunks-per-second
  metric we don't capture yet; would be a follow-up after Phase 4
  self-calibration (RFC §15) lands
- Wider integration (KbPolicyPromptBanner 'Index now' button, manual
  KB-modal sync): TierSelectionModal is the dominant bulk-decision
  surface and the right place to land this first
2026-05-20 10:16:00 -07:00
Chris Sherwood
563f86a22b feat(KB): conditional warnings A + B on Stored Files (RFC #883 §6)
Surfaces two silent failure modes that the prior binary
"any-chunks-in-Qdrant ⇒ embedded" check could not distinguish from
healthy ingestion:

- **Warning A — Zero-chunk file** (file_size > 100 MB, chunks = 0)
  Fires on video-only / image-only ZIMs (`lrnselfreliance_en_all`,
  TED talks, etc.) that the pipeline completes "successfully" with no
  extractable text. AI Assistant literally cannot reference these.

- **Warning B — Partial-embed stall** (chunks < 50% of expected from
  the ratio registry). Surfaces the simple_wiki "266 of 600,000 chunks"
  case observed during NOMAD1 ingestion testing — previously these
  looked identical to fully-completed embeds in the UI.

Both warnings render only when their condition is met (silent by
default; noisy only on real problems).

Base is `feat/kb-ratio-registry` (#891) because Warning B's "expected
chunks" estimate comes from `KbRatioRegistry.estimateChunks()`. GitHub
fast-forwards to `rc` once #891 merges.

- `app/utils/kb_warning_decision.ts` — pure `decideWarnings(inputs)`
  with thresholds (`100 MB`, `0.5×`) as exported constants. 10 unit
  tests cover the healthy case, both warnings, the under/at/over
  boundary, the registry-miss suppression, and the video-only registry
  case (`expectedChunks: 0` correctly skips Warning B).
- `RagService.computeFileWarnings()` — single Qdrant scroll tallies
  chunks per source, filesystem walk fills in zero-chunk files,
  ratio registry estimates the expectation, decision function emits.
- New endpoint `GET /api/rag/file-warnings` returns
  `Record<source, FileWarning[]>` (sources with no warnings are
  omitted, so the frontend can `warnings[source] ?? []` for clean
  defaults).
- KB modal: warnings render inline under the file name as amber-tinted
  pills. Polled every 30s alongside the existing health check.

- Warning C — chunks skipped due to length. PR #890 (#881 fix) prevents
  the silent drop at the embed boundary, so the underlying condition
  shouldn't fire anymore. If we still want to surface "we truncated
  N chunks to fit", that needs separate `skipped_count` tracking in
  EmbedFileJob — a Phase 2 follow-up.
- Suppressing Warning B during active mid-ingestion. The user can cross-
  reference the Processing Queue to know it's in-flight; suppressing
  warnings while a job runs would mask real stalls where the job died
  mid-batch. Will revisit when per-card status is wired through.
- Use of `kb_ingest_state.chunks_embedded` (#888) as the chunk count
  source. This PR uses Qdrant scroll directly so it can land
  independently of #888.

- 10 new unit tests on `decideWarnings`, all pass
- Type-check clean
- Hot-patch + browser smoke test deferred until #891 lands (the ratio
  registry needs to exist in the DB for `estimateChunks()` to return
  non-null estimates — without it, only Warning A fires which is still
  useful but Warning B stays dormant)
2026-05-20 10:16:00 -07:00
Chris Sherwood
e68c753e39 feat(KB): surface embedding-disk estimate in curated tier-change modal (RFC #883 §1)
When a user picks a tier in TierSelectionModal, show how much additional
disk space the AI Assistant will need if the new ZIMs are indexed, plus
a policy-aware footer explaining whether they'll auto-index (Always) or
wait for opt-in (Manual). Estimates consume #891's KbRatioRegistry via a
new POST /api/rag/estimate-batch endpoint.

Backend
- New POST /api/rag/estimate-batch route + RagController.estimateBatch
- VineJS schema accepting array of {filename, sizeBytes}, capped at 500
- KbRatioRegistry.estimateBatch aggregates via the existing prefix-match
  lookup, returns {totalChunks, totalBytes, hasUnknown}
- New BYTES_PER_CHUNK_ON_DISK constant (~8 KB: 3 KB vector + ~3 KB chunk
  text + ~2 KB payload/index overhead). Tunable; will be replaced by
  Phase 4 self-calibration once we have real measurements.
- Controller normalizes incoming filenames via path.basename so callers
  that send full paths or URLs still match registry prefixes correctly.

Frontend
- api.estimateEmbeddingBatch() client method
- TierSelectionModal: when localSelectedSlug is set, resolve the tier's
  resources (incl. inherited tiers), POST to /estimate-batch, and render
  a new info block with the +~X GB figure + ingest-policy copy. Also
  fetches rag.defaultIngestPolicy so the same block surfaces whether
  indexing will fire automatically or wait for the user.
- resourceFilename() helper extracts the basename from the resource URL
  so the registry lookup hits the right prefix regardless of mirror.

Tests
- 4 new cases in tests/unit/kb_ratio_lookup.spec.ts covering the
  estimateBatch aggregator: standard sum, unknown-flagging, video-only
  ZIM (0 chunks but known, hasUnknown stays false), empty input.

Stacks on feat/kb-ratio-registry (#891) — consumes the registry table
seeded by that PR. Once #891 merges to rc, this PR auto-rebases.

Out of scope for this PR (deferred to follow-ups):
- Per-batch opt-in checkbox (RFC §1's '☑ Also index these for AI') needs
  a per-batch policy override path and is a separate PR
- Guardrail modal at 50 GB / 10% free / 6 hr thresholds (RFC §7) is also
  separate; this PR is informational, not gating
- Time-to-embed estimate awaits a chunks-per-second metric per host
2026-05-20 10:16:00 -07:00
chriscrosstalk
8eb8809154 feat(KB): Always/Manual ingest policy toggle (RFC #883 §1/§4) (#894)
* feat(KB): per-file ingest state machine (Phase 1 of RFC #883)

Adds a persistent state machine for AI knowledge-base ingestion so the
scanner can distinguish "fully indexed", "user opted out", "failed", and
"stalled" from each other — none of which were derivable from the prior
binary "any chunks in Qdrant ⇒ embedded" check.

## What lands

- New table `kb_ingest_state` keyed by `file_path` with enum state column
  (`pending_decision | indexed | browse_only | failed | stalled`).
  Independent of `installed_resources` so it covers both curated downloads
  and manually-uploaded KB files.
- New KV key `rag.defaultIngestPolicy` (string: `Always | Manual`).
  Registered now but not consumed yet — JIT prompt + wizard step land in
  Phase 3 of the RFC.
- `EmbedFileJob.handle` writes state on terminal outcomes:
  - Success (final batch) → `indexed` + chunks count
  - `UnrecoverableError` → `failed` + error message
  - Retryable errors are left to BullMQ's existing retry path
- `scanAndSyncStorage` swaps the binary qdrant check for a state-aware
  decision tree (see `decideScanAction`). Existing installs auto-backfill
  on first scan: files with chunks in Qdrant but no state row become
  `indexed`; new files start as `pending_decision`.
- `deleteFileBySource` drops the state row last, so removed files
  disappear entirely instead of leaving an orphan that the next scan
  would re-dispatch into nothing.

## What does NOT land here

- Ratio registry (separate PR) — needed for partial-stall detection and
  cost estimates, but a separable concern.
- #880 follow-up initial-progress anchor (separate tiny PR).
- Phase 2 UI (status pill, per-card actions, conditional warnings).
- Phase 3 policy surfaces (wizard step, JIT prompt, guardrail modal).
- PR #886's bulk-action hookup — `_deletePointsBySource` / Re-embed All
  / Reset & Rebuild would also want to set state, but #886 isn't merged
  yet; that wiring goes in a follow-up once #886 lands.

## Target

This is forward work for v1.40.0 (RFC #883). Branching off `rc` because
that's the current latest base and post-GA Jake will sync rc→dev; a
retarget at PR-open time is a fast-forward if requested.

## Tests

- 9 new unit tests for `decideScanAction` covering all five states plus
  the no-row / chunks-present / chunks-missing combinations
- Type-check clean
- Smoke-tested end-to-end on NOMAD3 via hot-patch:
  - Backfill: 5 ZIMs + 2 KB uploads with existing chunks in Qdrant all
    came back `indexed` on first scan
  - Pending dispatch: a video-only ZIM with no chunks (`lrnselfreliance`)
    came back `pending_decision` and was correctly re-dispatched (Bull
    deduped to its historical `:completed` jobId — bgauger's #886 fix
    drains that)
  - Delete hook: deleting a KB upload via `DELETE /api/rag/files`
    removed both the disk file and the state row

* feat(KB): Always/Manual ingest policy toggle (RFC #883 §1/§4)

Activates the `rag.defaultIngestPolicy` KV registered in Phase 1
(#888) so users on a fresh install (or anyone who picks Manual mode)
no longer get every new ZIM auto-dispatched to the embed pipeline.

## Stacks on #888

This PR's base is `feat/kb-ingest-state-machine` (#888). The state
machine has to be in place for the decision function to be policy-aware;
GitHub will fast-forward the base to `rc` once #888 merges.

## Backend changes

- `decideScanAction` now takes a `policy: 'Always' | 'Manual'` argument
  (defaults to `Always` for backward compatibility).
- New `ScanAction` kind: `create_pending`. Manual mode records that the
  scanner has seen a new file (so the UI can surface a per-card Index
  affordance later) without dispatching an EmbedFileJob.
- `scanAndSyncStorage` reads the KV and passes it through. The scan-result
  log line now includes the active policy and a `waiting on user` count
  for Manual-mode hits.
- `rag.defaultIngestPolicy` added to `SETTINGS_KEYS` so it's reachable
  through the existing `GET/PATCH /api/system/settings` surface — no new
  endpoint.

## Frontend changes

- New section in the KB panel between "Why upload" and "Processing Queue":
  "Auto-index new content for AI? [Always | Manual]" — segmented radio
  with copy explaining the 5-10× disk multiplier. Default Always.
- `useQuery('ingestPolicy')` reads the current value; clicking the
  inactive option mutates and shows a notification confirming the new
  behavior.

## Tests

- 14 unit tests on `decideScanAction` (was 9) — split into Always-mode
  cases (preserves Phase 1's contract) and Manual-mode cases
  (`create_pending`, `pending_decision → skip`, etc.).
- Type-check clean.
- Hot-patch + browser verification deferred until #888 lands; the state
  machine smoke-tested cleanly on NOMAD3 in #888's PR, and this PR's
  decision-tree changes are exhaustively unit-tested.

## RFC open question §3 — policy-change re-trigger

Switching Manual → Always doesn't auto-dispatch existing `pending_decision`
rows immediately. The next scan re-evaluates and dispatches them under
the new policy. This matches the RFC's "treat the switch as I've-
thought-about-it" instinct for the guardrail; full guardrail
implementation lands in Phase 3 task 14.

---------

Co-authored-by: Jake Turner <52841588+jakeaturner@users.noreply.github.com>
2026-05-20 10:16:00 -07:00
Chris Sherwood
43ca584b6c feat(KB): status pill + last-activity timestamp on Processing Queue (RFC #883 §5/§10)
Each in-flight (or stuck) embedding job gets a colored health pill,
relative-activity timestamp, and chunk counter so users can tell at a
glance whether ingestion is making progress.

## Health states

- **🟢 Active** — last batch < 2 min ago
- **🟡 Slow** — last batch 2-5 min ago (CPU-paced multi-batch ingestion
  lives here naturally; not always a problem)
- **🔴 Stalled** — last batch > 5 min ago (likely real problem)
- ** Waiting** — queued, no batch started yet
- **🔴 Failed** — job recorded failed status

## What lands

- New backend util `kb_job_health.ts` with pure `computeJobHealth(input)`
  decision function. Time-based thresholds (2 min / 5 min) inlined as
  constants. 9 unit tests pin the boundaries.
- `EmbedJobWithProgress` gains `lastBatchAt`, `startedAt`, `chunks` —
  already set by `EmbedFileJob.handle` on every batch transition, just
  not previously surfaced through `listActiveJobs`.
- Frontend `kb_job_health_display.ts` maps each status to a Tailwind
  dot color, label, and aria-label so backend and UI stay in sync.
- `ActiveEmbedJobs.tsx` renders the pill, "last activity Xs ago", and
  chunk counter above each progress bar. Adds a manual Refresh button
  and "Last updated Xs ago" line — the existing 2s/30s auto-poll
  cadence in `useEmbedJobs` is left intact.
- Live tick at 5s keeps the relative timestamps current without
  re-fetching from the API.

## Not in scope

- Per-card Cancel / Retry / Un-index — separate Phase 2 PR
- Conditional warnings A/B/C — separate Phase 2 PR
- Computing throughput rate (chunks/min) — needs ratio registry consumer
  (Phase 2 follow-up); for now the pill answers the "is it stuck?"
  question directly without a rate estimate.
2026-05-20 10:16:00 -07:00
Chris Sherwood
c64ec97de4 feat(KB): group admin docs into single row in Stored Files (RFC #883 §9)
Project NOMAD's bundled docs (`/app/docs/*.md` and `README.md`) each
embed as their own KB source — currently rendering as 12+ individual
rows that swamp user-uploaded content in the Stored Files table.
Collapse them into one informational row:

> Project NOMAD documentation · 12 files · Managed by NOMAD

The admin-docs row hides the Delete button (those files would be
re-embedded on the next sync anyway, so deleting is a footgun). User
uploads and ZIMs keep their existing per-row Delete UX.

Also adds deterministic sort: ZIMs → user uploads → admin docs → other,
alphabetical within each bucket. Pure frontend change — `/api/rag/files`
response shape unchanged.

Decision logic extracted to `kb_file_grouping.ts` with 9 unit tests
covering bucket classification, sort order, count noun pluralization,
and empty-input handling.
2026-05-20 10:16:00 -07:00
Chris Sherwood
159d57b2af feat(KB): ratio registry for disk + time estimates (Phase 1B of RFC #883)
Foundation for the cost estimates and partial-stall detection that
Phase 2 will surface. No consumers yet — this PR just lays the table,
the seed rows, and the lookup helper so subsequent UI work has
estimates available without a per-ZIM benchmark.

## What lands

- New table `kb_ratio_registry` (pattern, chunks_per_mb, sample_count,
  notes). Migration creates and seeds heuristic defaults from the RFC
  appendix: devdocs (1100/MB), Wikipedia variants (270/MB), iFixit
  (50/MB), Stack Exchange Q&A (200/MB), video-only ZIMs (0), plus a
  catch-all fallback at 100/MB.
- `KbRatioRegistry` model with static `lookup()` and `estimateChunks()`.
- Pure helper `kb_ratio_lookup.ts` doing longest-prefix-match — a
  specific entry (`wikipedia_en_simple_`) overrides a broader one
  (`wikipedia_en_`). 9 unit tests covering the lookup boundary.
- `sample_count` starts at 0 (heuristic seed) and is reserved for
  Phase 4 self-calibration to increment as observed ZIMs update each row.

## Not in scope

- Self-calibration on successful ingestion (Phase 4)
- UI consumers — Warning B (partial-embed stall) and the storage budget
  meter / time estimates land in Phase 2.

## Tested

- Type-check clean
- 9 unit tests pass for `findChunksPerMb` and `estimateChunkCount`
- Migration applied on NOMAD3 via hot-patch; 9 seed rows verified in DB
2026-05-20 10:16:00 -07:00
chriscrosstalk
743549ca74 feat(KB): per-file ingest state machine (Phase 1 of RFC #883) (#888)
Adds a persistent state machine for AI knowledge-base ingestion so the
scanner can distinguish "fully indexed", "user opted out", "failed", and
"stalled" from each other — none of which were derivable from the prior
binary "any chunks in Qdrant ⇒ embedded" check.

## What lands

- New table `kb_ingest_state` keyed by `file_path` with enum state column
  (`pending_decision | indexed | browse_only | failed | stalled`).
  Independent of `installed_resources` so it covers both curated downloads
  and manually-uploaded KB files.
- New KV key `rag.defaultIngestPolicy` (string: `Always | Manual`).
  Registered now but not consumed yet — JIT prompt + wizard step land in
  Phase 3 of the RFC.
- `EmbedFileJob.handle` writes state on terminal outcomes:
  - Success (final batch) → `indexed` + chunks count
  - `UnrecoverableError` → `failed` + error message
  - Retryable errors are left to BullMQ's existing retry path
- `scanAndSyncStorage` swaps the binary qdrant check for a state-aware
  decision tree (see `decideScanAction`). Existing installs auto-backfill
  on first scan: files with chunks in Qdrant but no state row become
  `indexed`; new files start as `pending_decision`.
- `deleteFileBySource` drops the state row last, so removed files
  disappear entirely instead of leaving an orphan that the next scan
  would re-dispatch into nothing.

## What does NOT land here

- Ratio registry (separate PR) — needed for partial-stall detection and
  cost estimates, but a separable concern.
- #880 follow-up initial-progress anchor (separate tiny PR).
- Phase 2 UI (status pill, per-card actions, conditional warnings).
- Phase 3 policy surfaces (wizard step, JIT prompt, guardrail modal).
- PR #886's bulk-action hookup — `_deletePointsBySource` / Re-embed All
  / Reset & Rebuild would also want to set state, but #886 isn't merged
  yet; that wiring goes in a follow-up once #886 lands.

## Target

This is forward work for v1.40.0 (RFC #883). Branching off `rc` because
that's the current latest base and post-GA Jake will sync rc→dev; a
retarget at PR-open time is a fast-forward if requested.

## Tests

- 9 new unit tests for `decideScanAction` covering all five states plus
  the no-row / chunks-present / chunks-missing combinations
- Type-check clean
- Smoke-tested end-to-end on NOMAD3 via hot-patch:
  - Backfill: 5 ZIMs + 2 KB uploads with existing chunks in Qdrant all
    came back `indexed` on first scan
  - Pending dispatch: a video-only ZIM with no chunks (`lrnselfreliance`)
    came back `pending_decision` and was correctly re-dispatched (Bull
    deduped to its historical `:completed` jobId — bgauger's #886 fix
    drains that)
  - Delete hook: deleting a KB upload via `DELETE /api/rag/files`
    removed both the disk file and the state row

Co-authored-by: Jake Turner <52841588+jakeaturner@users.noreply.github.com>
2026-05-20 10:16:00 -07:00
Chris Sherwood
5e2c599c3e fix(ZIM): preserve co-existing Wikipedia corpora on cleanup (#884)
onWikipediaDownloadComplete was deleting every file whose name starts
with `wikipedia_en_`, treating distinct corpora (simple, medicine,
wikivoyage, climate_change, etc.) as competing versions of the same
selection slot. Whichever wiki finished second silently wiped the
other from disk.

Match by filename stem instead — strip the trailing `_YYYY-MM(-DD).zim`
date suffix and only delete files with the same stem as the new
download. Different release dates of the same variant still get cleaned
up; distinct variants are preserved.

Extracted the predicate to `app/utils/zim_filename.ts` so the boundary
is covered by unit tests (8 cases incl. the #884 repro scenario).
2026-05-20 10:16:00 -07:00
Ryanba
5517e826aa fix(UI): improve global map banner display logic (#702) 2026-05-20 10:16:00 -07:00
Jake Turner
b33a1b3e37 feat: initial commit 2025-06-29 15:51:08 -07:00