project-nomad/admin/app
Henry Estela 8b54310746
Improve context window size estimation
fixes issue seen with some models in lm studio resulting in:
"The number of tokens to keep from the initial prompt is greater than the context length (n_keep: 4705>= n_ctx: 4096)"

Fixed char/token estimate, the old value was too optimistic,
causing the cap to allow more text than the budget allowed in actual tokens.
After RAG injection, estimates the system prompt token count.
If it exceeds ~3000 tokens, requests the next standard context size (8192, 16384, 32768, or 65536),
large enough to fit the prompt plus a 2048-token buffer for the conversation and response.

For Ollama, num_ctx is honoured per-request and will load the model with that context
window. For LM Studio, the parameter is silently ignored — but the tighter char
estimate will also reduce how much RAG text gets stuffed in, so it's less likely to
overflow.
2026-03-25 17:18:06 -07:00
..
controllers Improve context window size estimation 2026-03-25 17:18:06 -07:00
exceptions fix(Docs): documentation renderer fixes 2025-12-23 16:00:33 -08:00
jobs fix(ai-chat): ingestion of documents with openai and add cleanup button 2026-03-25 17:18:05 -07:00
middleware feat: background job overhaul with bullmq 2025-12-06 23:59:01 -08:00
models feat: support for updating services 2026-03-11 14:08:09 -07:00
services Improve context window size estimation 2026-03-25 17:18:06 -07:00
utils fix(disk): correct storage display by fixing device matching and dedup mount entries 2026-03-20 11:46:10 -07:00
validators feat(AI Assistant): improved state management and performance 2026-03-11 14:08:09 -07:00