fix(RAG): pass num_ctx and truncate to Ollama embed call (#763)

Some Ollama installs ship nomic-embed-text:v1.5 with the embedding
model's default num_ctx=2048, which the RAG chunker (sized for ~1500
tokens of estimated content with ratio=2 chars/token) can exceed on
dense PDFs. The result is `400 the input length exceeds the context
length` from /api/embed, which then hits the OpenAI-compatible
fallback (which also errors), and surfaces as a BadRequestError.

Pass options.num_ctx=8192 (nomic-embed-text v1.5's RoPE-extrapolated
max) and truncate=true (silent truncation safety net) on every
embed call so we don't depend on the local modelfile defaults.

Reported on #756 by @NC4WD; same root cause as #369 and #670 which
were closed without an actual fix.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
chriscrosstalk 2026-04-27 21:43:10 -07:00 committed by GitHub
parent 00b4b26224
commit b194dfa136
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -480,10 +480,21 @@ export class OllamaService {
}
try {
// Prefer Ollama native endpoint (supports batch input natively)
// Prefer Ollama native endpoint (supports batch input natively).
// Pass num_ctx explicitly so we don't depend on the embedding model's
// modelfile defaults. Some installs ship nomic-embed-text:v1.5 with
// num_ctx=2048, which our chunker (sized for ~1500 tokens) can exceed
// on dense content, causing "input length exceeds context length" errors.
// truncate:true is a runtime safety net for any chunk that still overshoots.
// 8192 matches nomic-embed-text:v1.5's RoPE-extrapolated max.
const response = await axios.post(
`${this.baseUrl}/api/embed`,
{ model, input },
{
model,
input,
truncate: true,
options: { num_ctx: 8192 },
},
{ timeout: 60000 }
)
// Some backends (e.g. LM Studio) return HTTP 200 for unknown endpoints with an incompatible