project-nomad/admin/app/middleware/compression_middleware.ts
chriscrosstalk 4b21ea6c40 fix(API): skip compression for Server-Sent Events (#798)
* fix(stream): skip compression for Server-Sent Events

The global compression middleware (added in v1.31.0-rc.2) buffers
response writes to determine encoding, which collapses per-token
streaming into a single block delivered after generation completes.
This broke the AI chat streaming UX from v1.31.0-rc.2 onward — text
no longer appears progressively as the model generates it, only at
the end.

Adds a filter to compression() that returns false when the response
Content-Type is text/event-stream. Other responses still go through
the default compression filter (compressible types are still
compressed; e.g. text/html via Brotli).

Reproduced on NOMAD3 v1.31.1: before fix, all SSE chunks for a 1B
model arrive within 10ms of each other after the model finishes.
After fix, tokens arrive at ~150ms intervals as they're generated
on a 12B model, with no Content-Encoding header on the SSE response.

Verified on the same host that /home still returns
Content-Encoding: br for HTML responses.

Closes #781. Reported and bisected by @toasterking
(works in v1.31.0-rc.1, broken from v1.31.0-rc.2 onward).

* fix(stream): use any for filter params to match existing as-any pattern

The compression library types its filter as (req: Request, res: Response)
expecting Express types, but AdonisJS passes raw IncomingMessage/ServerResponse
which is why the surrounding middleware uses `as any` casts at the call site.
The IncomingMessage/ServerResponse types I added are runtime-correct but
fail tsc against the library's declared types.

Drop the typed import in favor of `any` parameters, which matches how the
existing `compress(request.request as any, response.response as any, ...)`
call resolves the same mismatch.
2026-05-20 10:16:00 -07:00

36 lines
1.2 KiB
TypeScript

import env from '#start/env'
import type { HttpContext } from '@adonisjs/core/http'
import type { NextFn } from '@adonisjs/core/types/http'
import compression from 'compression'
// Skip compression for Server-Sent Events. The compression library buffers
// response writes to determine encoding, which collapses per-token streaming
// into a single block delivered after generation completes (regression in
// v1.31.0-rc.2, reported in #781 by @toasterking).
const compress = env.get('DISABLE_COMPRESSION')
? null
: compression({
filter: (req: any, res: any) => {
const contentType = res.getHeader('Content-Type')
if (typeof contentType === 'string' && contentType.includes('text/event-stream')) {
return false
}
return compression.filter(req, res)
},
})
export default class CompressionMiddleware {
async handle({ request, response }: HttpContext, next: NextFn) {
if (!compress) return await next()
await new Promise<void>((resolve, reject) => {
compress(request.request as any, response.response as any, (err?: any) => {
if (err) reject(err)
else resolve()
})
})
await next()
}
}