mirror of
https://github.com/Crosstalk-Solutions/project-nomad.git
synced 2026-03-28 11:39:26 +01:00
fixes issue seen with some models in lm studio resulting in: "The number of tokens to keep from the initial prompt is greater than the context length (n_keep: 4705>= n_ctx: 4096)" Fixed char/token estimate, the old value was too optimistic, causing the cap to allow more text than the budget allowed in actual tokens. After RAG injection, estimates the system prompt token count. If it exceeds ~3000 tokens, requests the next standard context size (8192, 16384, 32768, or 65536), large enough to fit the prompt plus a 2048-token buffer for the conversation and response. For Ollama, num_ctx is honoured per-request and will load the model with that context window. For LM Studio, the parameter is silently ignored — but the tighter char estimate will also reduce how much RAG text gets stuffed in, so it's less likely to overflow. |
||
|---|---|---|
| .. | ||
| controllers | ||
| exceptions | ||
| jobs | ||
| middleware | ||
| models | ||
| services | ||
| utils | ||
| validators | ||