Commit Graph

171 Commits

Author SHA1 Message Date
chriscrosstalk
216509ae0d fix(rag): repair ZIM embedding pipeline (sync filter, batch gate, DOM walk) (#745)
Three bugs in the RAG embedding pipeline, diagnosed and patched by @sbruschke
against v1.31.0 with working before/after chunk counts. All three are
root-cause contributors to #388.

1. scanAndSyncStorage queued every file under /storage/zim/ for embedding,
   including Kiwix's generated kiwix-library.xml. EmbedFileJob rejected it
   with "Unsupported file type" and the default 30-attempt retry policy
   kept it looping on every sync, flooding nomad_admin logs. Now gated on
   determineFileType(filePath) !== 'unknown'.

2. hasMoreBatches compared zimChunks.length (section-level chunk count
   under the 'structured' strategy) against ZIM_BATCH_SIZE (an article
   limit). Because articles emit multiple sections, the two are never
   equal for real archives and processing silently stopped after the
   first 50 articles. Now gated on articlesInBatch >= ZIM_BATCH_SIZE.

3. extractStructuredContent walked only direct children of <body>, so any
   ZIM that wraps content in a container div (Devdocs, Wikipedia,
   FreeCodeCamp, React docs, etc.) produced zero sections and silently
   embedded zero chunks while reporting success. Now walks the full DOM
   via $('body').find('h2, h3, h4, p, ul, ol, dl, table'), with a
   whole-body text fallback when the selector walk yields nothing.

Before/after chunk counts confirmed by @sbruschke on v1.31.0:
  devdocs_en_git   0 -> 916
  devdocs_en_react 0 -> 481
  devdocs_en_node  0 -> 423
  libretexts_en_eng 1 -> 35 (climbing)
Wikipedia resumed progressing normally through its 6M articles.

Closes #718
Closes #719
Closes #720
Closes #388

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:26:28 -07:00
chriscrosstalk
810a70acb7 fix(ZIM): accumulate across Kiwix pages to prevent empty Content Explorer (#746)
When many ZIMs are already installed locally, a single Kiwix catalog page
(12 items) could return 12 already-installed items, which zim_service
would fully filter out client-side. The endpoint returned items: [] with
has_more: true, and the frontend's infinite-scroll guard
(flatData.length > 0) blocked fetchNextPage — leaving the user with
"No records found" despite plenty of uninstalled ZIMs available.

Backend now accumulates across up to 5 Kiwix fetches (60 items each)
until it has enough post-filter results to return, dedupes by entry id,
advances currentStart by actual entries returned (not requested), and
returns a next_start cursor. The frontend consumes that cursor instead
of computing Kiwix offsets locally, and the flatData.length > 0 guard is
removed so the existing on-mount effect drives bounded auto-fetch when
a short page lands.

The pre-existing has_more off-by-one (compared totalResults against the
input start rather than the post-fetch position) is fixed implicitly.

Diagnosis credit: @johno10661.

Closes #731

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:26:28 -07:00
chriscrosstalk
6646b3480b fix(AI): stop local nomad_ollama container when remote Ollama is configured (#744)
When users set a remote Ollama URL via AI Settings, the local nomad_ollama
container continued running and competed with the remote host for port 11434
and GPU access. Now configureRemote stops the local container on set and
restores it on clear (if still present). Container and its models volume are
preserved so the local install can be re-enabled later.

Closes #662

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:26:28 -07:00
chriscrosstalk
056556497c docs: add Community Add-Ons page with field manuals + W3Schools packs (#753)
Introduces a dedicated page listing third-party ZIM content packs built
by the community. Launches with the two current add-ons (jrsphoto field
manuals, kennethbrewer W3Schools) and explains how to install a ZIM pack
and where to submit a new one for inclusion.

- New doc at admin/docs/community-add-ons.md
- Wired into DocsService DOC_ORDER (slot 4) and TITLE_OVERRIDES so the
  hyphen in "Add-Ons" is preserved in the sidebar
- README gets a link under Community & Resources

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:26:28 -07:00
chriscrosstalk
6c33a96972 fix(AI): allow cancelling in-progress model downloads and ensure consistent progress UI (#701)
Adds a cancel button to in-progress Ollama model downloads and unifies
the Active Model Downloads card layout with the Active Downloads card
used for ZIMs, maps, and pmtiles (byte counts, progress bar, live speed,
status indicator).

Closes #676.
2026-04-21 14:26:28 -07:00
Luís Miguel
806b2c1714 fix(security): SSRF validation for map downloads and error sanitization (CWE-918, CWE-209) (#552)
* fix(security): add SSRF validation to map download URLs from manifest
* fix(security): sanitize verbose error in rag controller scan endpoint
* fix(security): sanitize verbose errors in benchmark controller
* fix(security): sanitize verbose error in system controller version fetch
* fix(security): sanitize verbose errors in chats controller (6 instances)
* fix(security): sanitize verbose errors in docker service (6 instances)
* fix(security): sanitize verbose error in system update service
* fix(security): sanitize verbose errors in collection update service
---------
Co-authored-by: Jake Turner <52841588+jakeaturner@users.noreply.github.com>
2026-04-21 14:26:28 -07:00
Jake Turner
2b8c847295 fix(Downloads): remove duplicate err listnr and improv Range req stability 2026-04-21 14:26:28 -07:00
Aaron Bird
8d026da06e fix(downloads): stage downloads to .tmp to prevent Kiwix loading partial files
Downloads are now written to `filepath + '.tmp'` and atomically renamed
to the final path only on successful completion. Kiwix globs for `*.zim`
and ZimService filters `.endsWith('.zim')`, so `.tmp` files are invisible
to both during download. The same staging applies to `.pmtiles` map files.

Ref #372

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 14:26:28 -07:00
Jake Turner
c8cb79a3a5 fix: prevent ZIM corrupt file crash and deduplicate Ollama download logs (#741)
Corrupted ZIM files cause a native C++ abort (ZimFileFormatError) that
bypasses JS try/catch and kills the process. Add magic number validation
before passing files to @openzim/libzim so invalid files are skipped
gracefully. Also deduplicate Ollama download progress broadcasts — both
within a single stream (skip unchanged percentages) and across concurrent
callers (share one download promise per model).

Co-authored-by: aegisman <aegis@manicode.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 14:26:28 -07:00
Henry Estela
6510f42184 fix(AI): qwen2.5 loading on every chat message (#649)
Use the currently loaded model for chat title generation and query rewrite.
2026-04-21 14:26:28 -07:00
chriscrosstalk
0183b42d71 feat(maps): add scale bar and location markers (#636)
Add distance scale bar and user-placed location pins to the offline maps viewer.

- Scale bar (bottom-left) shows distance reference that updates with zoom level
- Click anywhere on map to place a named pin with color selection (6 colors)
- Collapsible "Saved Locations" panel lists all pins with fly-to navigation
- Full dark mode support for popups and panel via CSS overrides
- New `map_markers` table with future-proofed columns for routing (marker_type,
  route_id, route_order, notes) to avoid a migration when routes are added later
- CRUD endpoints: GET/POST /api/maps/markers, PATCH/DELETE /api/maps/markers/:id
- VineJS validation on create/update
- MapMarker Lucid model

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:26:50 -07:00
Jake Turner
6287755946 fix(Maps): ensure proper parsing of hostnames (#640) 2026-04-03 14:26:50 -07:00
0xGlitch
d7e3d9246b fix(downloads): improved handling for large file downloads and user-initiated cancellation (#632)
* fix(downloads): increase retry attempts and backoff for large file downloads
* fix download retry config and abort handling
* use abort reason to detect user-initiated cancels
2026-04-03 14:26:50 -07:00
Jake Turner
cb4fa003a4 fix: cache docker list requests, aiAssistantName fetching, and ensure inertia used properly 2026-04-03 14:26:50 -07:00
Jake Turner
877fb1276a feat: gzip compression by default for all registered routes 2026-04-03 14:26:50 -07:00
Jake Turner
9e3828bcba feat(Kiwix): migrate to Kiwix library mode for improved stability (#622) 2026-04-03 14:26:50 -07:00
Henry Estela
0edfdead90 feat(AI): enable flash_attn by default and disable ollama cloud (#616)
New defaults:
OLLAMA_NO_CLOUD=1 - "Ollama can run in local only mode by disabling
Ollama’s cloud features. By turning off Ollama’s cloud features, you
will lose the ability to use Ollama’s cloud models and web search."
https://ollama.com/blog/web-search
https://docs.ollama.com/faq#how-do-i-disable-ollama%E2%80%99s-cloud-features
example output:
```
ollama run minimax-m2.7:cloud
Error: ollama cloud is disabled: remote model details are unavailable
```
This setting can be safely disabled as you have to click on a link to
login to ollama cloud and theres no real way to do that in nomad outside
of looking at the nomad_ollama logs.

This one can be disabled in settings in case theres a model out there
that doesn't play nice. but that doesnt seem necessary so far.
OLLAMA_FLASH_ATTENTION=1 - "Flash Attention is a feature of most modern
models that can significantly reduce memory usage as the context size
grows. "

Tested with llama3.2:
```
docker logs nomad_ollama --tail 1000 2>&1 |grep --color -i flash_attn
llama_context: flash_attn    = enabled
```

And with second_constantine/deepseek-coder-v2 with is based on
https://huggingface.co/lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF
which is a model that specifically calls out that you should disable
flash attention, but during testing it seems ollama can do this for you
automatically:
```
docker logs nomad_ollama --tail 1000 2>&1 |grep --color -i flash_attn
llama_context: flash_attn    = disabled
```
2026-04-03 14:26:50 -07:00
Jake Turner
2e3253b1ac fix(Jobs): improved error handling and robustness 2026-04-03 14:26:50 -07:00
Jake Turner
f4beb9a18a fix(Maps): remove unused import 2026-04-03 14:26:50 -07:00
chriscrosstalk
bac53e28dc feat(downloads): rich progress, friendly names, cancel, and live status (#554)
* feat(downloads): rich progress, friendly names, cancel, and live status

Redesign the Active Downloads UI with four improvements:

- Rich progress: BullMQ jobs now report downloadedBytes/totalBytes instead
  of just a percentage, showing "2.3 GB / 5.1 GB" instead of "78% / 100%"
- Friendly names: dispatch title metadata from curated categories, Content
  Explorer library, Wikipedia selector, and map collections
- Cancel button: Redis-based cross-process abort signal lets users cancel
  active downloads with file cleanup. Confirmation step prevents accidents.
- Live status indicator: green pulsing dot with transfer speed for active
  downloads, orange stall warning after 60s of no data, gray dot for queued

Backward compatible with in-flight jobs that have integer-only progress.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(downloads): fix cancel, dismiss, speed, and retry bugs

- Speed indicator: only set prevBytesRef on first observation to prevent
  intermediate re-renders from inflating the calculated speed
- Cancel: throw UnrecoverableError on abort to prevent BullMQ retries
- Dismiss: remove stale BullMQ lock before job.remove() so cancelled
  jobs can actually be dismissed
- Retry: add getActiveByUrl() helper that checks job state before
  blocking re-download, auto-cleans terminal jobs
- Wikipedia: reset selection status to failed on cancel so the
  "downloading" state doesn't persist

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(downloads): improve cancellation logic and surface true BullMQ job states

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jake Turner <jturner@cosmistack.com>
2026-04-03 14:26:50 -07:00
David Gross
b65b6d6b35 fix(Maps): add x-forwarded-proto support to handle https termination (#600) 2026-04-03 14:26:50 -07:00
Sebastion
e9af7a555b fix: block IPv4-mapped IPv6 and IPv6 all-zeros in SSRF check (#520)
The assertNotPrivateUrl() function blocked standard loopback and link-local
addresses but could be bypassed using IPv4-mapped IPv6 representations:

  - http://[::ffff:127.0.0.1]:8080/ → loopback bypass
  - http://[::ffff:169.254.169.254]:8080/ → metadata endpoint bypass
  - http://[::]:8080/ → all-interfaces bypass

Node.js normalises these to [::ffff:7f00:1], [::ffff:a9fe:a9fe], and [::]
respectively, none of which matched the existing regex patterns.

Add two patterns to close the gap:
  - /^\[::ffff:/i catches all IPv4-mapped IPv6 addresses
  - /^\[::\]$/ catches the IPv6 all-zeros address

Legitimate RFC1918 LAN URLs (192.168.x, 10.x, 172.16-31.x) remain allowed.
2026-04-03 14:26:50 -07:00
Luís Miguel
b183bc6745 fix(security): validate key parameter on settings read endpoint#517
Co-authored-by: Jake Turner <52841588+jakeaturner@users.noreply.github.com>
2026-04-03 14:26:50 -07:00
Jake Turner
fc6152c908 feat: support adding labels on dynamic container creation (#620)
Co-authored-by: Benjamin Sanders <ben@benjaminsanders.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:26:50 -07:00
0xGlitch
789fdfe95d feat(maps): add global map download from Protomaps (#525)
* feat(maps): add global map download from Protomaps
* fix: add path traversal check to global map download
2026-04-03 14:26:50 -07:00
arn6694
ed8918f2e9 feat(rag): add EPUB file support for Knowledge Base uploads (#257) 2026-04-03 14:26:50 -07:00
Henry Estela
69c15b8b1e feat(AI): enable remote AI chat host 2026-04-03 14:26:50 -07:00
Jake Turner
d25292a713
Revert "feat: support adding labels on dynamic container creation (#610)" (#619)
This reverts commit f32ba3bb51.
2026-04-01 11:04:11 -07:00
Benjamin Sanders
f32ba3bb51
feat: support adding labels on dynamic container creation (#610)
Co-authored-by: Jake Turner <jturner@cosmistack.com>
2026-04-01 11:03:44 -07:00
Tom Boucher
6b558531be fix: surface actual error message when service installation fails
Backend returned { error: message } on 400 but frontend expected { message }.
catchInternal swallowed Axios errors and returned undefined, causing a
generic 'An internal error occurred' message instead of the real reason
(already installed, already in progress, not found).

- Fix 400 response shape to { success: false, message } in controller
- Replace catchInternal with direct error handling in installService,
  affectService, and forceReinstallService API methods
- Extract error.response.data.message from Axios errors so callers
  see the actual server message
2026-03-25 16:30:35 -07:00
Bortlesboat
4642dee6ce fix: benchmark scores clamped to 0% for below-average hardware
The log2 normalization formula `50 * (1 + log2(ratio))` produces negative
values (clamped to 0) whenever the measured value is less than half the
reference. For example, a CPU scoring 1993 events/sec against a 5000
reference gives ratio=0.4, log2(0.4)=-1.32, score=-16 -> 0%.

Fix by dividing log2 by 3 to widen the usable range. This preserves the
50% score at the reference value while allowing below-average hardware
to receive proportional non-zero scores (e.g., 28% for the CPU above).

Also adds debug logging for CPU sysbench output parsing to aid future
diagnosis of parsing issues.

Fixes #415
2026-03-25 16:30:35 -07:00
Chris Sherwood
78c0b1d24d fix(ai): surface model download errors and prevent silent retry loops
Model downloads that fail (e.g., when Ollama is too old for a model)
were silently retrying 40 times with no UI feedback. Now errors are
broadcast via SSE and shown in the Active Model Downloads section.
Version mismatch errors use UnrecoverableError to fail immediately
instead of retrying. Stale failed jobs are cleared on retry so users
aren't permanently blocked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 16:30:35 -07:00
builder555
4443799cc9 fix(Collections): update ZIM files to latest versions (#332)
* fix: update data sources to newer versions
* fix: bump spec version for wikipedia
2026-03-25 16:30:35 -07:00
Jake Turner
b8cf1b6127 fix(disk): correct storage display by fixing device matching and dedup mount entries 2026-03-20 11:46:10 -07:00
Chris Sherwood
571f6bb5a2 fix(GPU): persist GPU type to KV store for reliable passthrough
GPU detection results were only applied at container creation time and
never persisted. If live detection failed transiently (Docker daemon
hiccup, runtime temporarily unavailable), Ollama would silently fall
back to CPU-only mode with no way to recover short of force-reinstall.

Now _detectGPUType() persists successful detections to the KV store
(gpu.type = 'nvidia' | 'amd') and uses the saved value as a fallback
when live detection returns nothing. This ensures GPU config survives
across container recreations regardless of transient detection failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:46:10 -07:00
Chris Sherwood
023e3f30af fix(downloads): allow users to dismiss failed downloads
Failed download jobs persist in BullMQ forever with no way to clear
them, leaving stale error notifications in Content Explorer and Easy
Setup. Adds a dismiss button (X) on failed download cards that removes
the job from the queue via a new DELETE endpoint.

- Backend: DELETE /api/downloads/jobs/:jobId endpoint
- Frontend: X button on failed download cards with immediate refresh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:46:10 -07:00
Jake Turner
5dc48477f6 fix(Docker): ensure fresh GPU detection when Ollama ctr updated 2026-03-20 11:46:10 -07:00
Chris Sherwood
b0b8f07661 fix: improve download reliability with stall detection, failure visibility, and Wikipedia status tracking
Three bugs caused downloads to hang, disappear, or leave stuck spinners:
1. Wikipedia downloads that failed never updated the DB status from 'downloading',
   leaving the spinner stuck forever. Now the worker's failed handler marks them as failed.
2. No stall detection on streaming downloads - if data stopped flowing mid-download,
   the job hung indefinitely. Added a 5-minute stall timer that triggers retry.
3. Failed jobs were invisible to users since only waiting/active/delayed states were
   queried. Now failed jobs appear with error indicators in the download list.

Closes #364, closes #216

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:46:10 -07:00
Chris Sherwood
34076b107b fix: prevent embedding retry storm when Ollama is not installed
When Ollama isn't installed, every ZIM download dispatches embedding jobs
that fail and retry 30x with 60s backoff. With many ZIM files downloading
in parallel, this exhausts Redis connections with EPIPE/ECONNRESET errors.

Two changes:
1. Don't dispatch embedding jobs when Ollama isn't installed (belt)
2. Use BullMQ UnrecoverableError for "not installed" so jobs fail
   immediately without retrying (suspenders)

Closes #351

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 11:46:10 -07:00
Chris Sherwood
e4fde22dd9 feat(UI): add Debug Info modal for bug reporting
Add a "Debug Info" link to the footer and settings sidebar that opens a
modal with non-sensitive system information (version, OS, hardware, GPU,
installed services, internet status, update availability). Users can copy
the formatted text and paste it into GitHub issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 11:46:10 -07:00
Jake Turner
9220b4b83d fix(maps): respect request protocol for reverse proxy HTTPS support 2026-03-20 11:46:10 -07:00
Chris Sherwood
6a737ed83f feat(UI): add Support the Project settings page
Adds a new settings page with Ko-fi donation link, Rogue Support
banner, and community contribution options (GitHub, Discord).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 11:46:10 -07:00
Chris Sherwood
baf16ae824 fix(security): rotate benchmark HMAC signing secret
Rotate the HMAC secret used for signing benchmark submissions to the
community leaderboard. The previous secret was compromised (hardcoded
in open-source code and used to submit a fake leaderboard entry).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:46:17 -07:00
Jake Turner
96e5027055 feat(AI Assistant): performance improvements and smarter RAG context usage 2026-03-11 14:08:09 -07:00
Jake Turner
460756f581 feat(AI Assistant): improved state management and performance 2026-03-11 14:08:09 -07:00
Jake Turner
d30c1a1407 fix(System): ensure nomad container image tag resolves correctly 2026-03-11 14:08:09 -07:00
Chris Sherwood
5d3c659d05 fix(security): narrow SSRF scope to allow RFC1918 LAN addresses
NOMAD is a LAN appliance — blocking RFC1918 private ranges (10.x,
172.16-31.x, 192.168.x) would prevent users from downloading content
from local network mirrors. Narrowed to only block loopback (localhost,
127.x, 0.0.0.0, ::1) and link-local (169.254.x, fe80::) addresses.
Restored require_tld: false for LAN hostnames without TLDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 14:08:09 -07:00
Chris Sherwood
75106a8f61 fix(security): path traversal and SSRF protections from pre-launch audit
Fixes 4 high-severity findings from a comprehensive security audit:

1. Path traversal on ZIM file delete — resolve()+startsWith() containment
2. Path traversal on Map file delete — same pattern
3. Path traversal on docs read — same pattern (already used in rag_service)
4. SSRF on download endpoints — block private/internal IPs, require TLD

Also adds assertNotPrivateUrl() to content update endpoints.

Full audit report attached as admin/docs/security-audit-v1.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 14:08:09 -07:00
Jake Turner
58b106f388 feat: support for updating services 2026-03-11 14:08:09 -07:00
Chris Sherwood
650ae407f3 feat(GPU): warn when GPU passthrough not working and offer one-click fix
Ollama can silently run on CPU even when the host has an NVIDIA GPU,
resulting in ~3 tok/s instead of ~167 tok/s. This happens when Ollama
was installed before the GPU toolkit, or when the container was
recreated without proper DeviceRequests. Users had zero indication.

Adds a GPU health check to the system info API response that detects
when the host has an NVIDIA runtime but nvidia-smi fails inside the
Ollama container. Shows a warning banner on the System Information
and AI Settings pages with a one-click "Reinstall AI Assistant"
button that force-reinstalls Ollama with GPU passthrough.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 14:08:09 -07:00