Commit Graph

15 Commits

Author SHA1 Message Date
Henry Estela
8b54310746
Improve context window size estimation
fixes issue seen with some models in lm studio resulting in:
"The number of tokens to keep from the initial prompt is greater than the context length (n_keep: 4705>= n_ctx: 4096)"

Fixed char/token estimate, the old value was too optimistic,
causing the cap to allow more text than the budget allowed in actual tokens.
After RAG injection, estimates the system prompt token count.
If it exceeds ~3000 tokens, requests the next standard context size (8192, 16384, 32768, or 65536),
large enough to fit the prompt plus a 2048-token buffer for the conversation and response.

For Ollama, num_ctx is honoured per-request and will load the model with that context
window. For LM Studio, the parameter is silently ignored — but the tighter char
estimate will also reduce how much RAG text gets stuffed in, so it's less likely to
overflow.
2026-03-25 17:18:06 -07:00
Henry Estela
c8ce28a84f
fix(ai-chat): ingestion of documents with openai and add cleanup button
Added a cleanup failed button for Processing Queue in the Knowledge Base
since documents that fail to process tend to get stuck and then can't be
cleared.

Fixed the ingestion of documents for OpenAI servers.

Updated some text in the chat and chat settings since user will need to
manually download models when using a non-ollama remote gpu server.
2026-03-25 17:18:05 -07:00
Henry Estela
f98664921a
feat(ai-chat): Add support for OpenAI API
Exisiting Ollama API support still functions as before. OpenAI vs
Ollama API mostly have the same features, however model file size is not
support with OpenAI's API so when a user chooses one of those then the
models will just show up as the model name without the size.

`npm install openai` triggered some updates in admin/package-lock.json
such as adding many instances of "dev: true".

This further enhances the user's ability to run the LLM on a different
host.
2026-03-25 17:18:05 -07:00
Chris Sherwood
78c0b1d24d fix(ai): surface model download errors and prevent silent retry loops
Model downloads that fail (e.g., when Ollama is too old for a model)
were silently retrying 40 times with no UI feedback. Now errors are
broadcast via SSE and shown in the Active Model Downloads section.
Version mismatch errors use UnrecoverableError to fail immediately
instead of retrying. Stale failed jobs are cleared on retry so users
aren't permanently blocked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 16:30:35 -07:00
Jake Turner
db69428193 fix(AI): allow force refresh of models list 2026-03-11 14:08:09 -07:00
Jake Turner
6874a2824f feat(Models): paginate available models endpoint 2026-03-03 20:51:38 -08:00
Jake Turner
98b65c421c feat(AI): thinking and response streaming 2026-02-18 21:22:53 -08:00
Jake Turner
d55ff7b466
feat: curated content update checking 2026-02-11 21:49:46 -08:00
Jake Turner
12286b9d34 feat: display model download progress 2026-02-06 16:22:23 -08:00
Jake Turner
a91c13867d fix: filter cloud models from API response 2026-02-04 17:05:20 -08:00
Jake Turner
d4cbc0c2d5 feat(AI): add fuzzy search to models list 2026-02-04 16:45:12 -08:00
Jake Turner
ab07551719 feat: auto add NOMAD docs to KB on AI install 2026-02-03 23:15:54 -08:00
Jake Turner
907982062f feat(Ollama): cleanup model download logic and improve progress tracking 2026-02-03 23:15:54 -08:00
Jake Turner
31c671bdb5 fix: service name defs and ollama ui location 2026-02-01 05:46:23 +00:00
Jake Turner
243f749090 feat: [wip] native AI chat interface 2026-01-31 20:39:49 -08:00