Three bugs caused downloads to hang, disappear, or leave stuck spinners:
1. Wikipedia downloads that failed never updated the DB status from 'downloading',
leaving the spinner stuck forever. Now the worker's failed handler marks them as failed.
2. No stall detection on streaming downloads - if data stopped flowing mid-download,
the job hung indefinitely. Added a 5-minute stall timer that triggers retry.
3. Failed jobs were invisible to users since only waiting/active/delayed states were
queried. Now failed jobs appear with error indicators in the download list.
Closes#364, closes#216
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EPUBs are ZIP archives containing structured XHTML content with semantic
chapter/section markup, making them well-suited for RAG text extraction
and chunking.
Changes:
- Add 'epub' to determineFileType() in utils/fs.ts
- Add processEPUBFile() method in rag_service.ts that:
- Reads container.xml to locate the OPF manifest
- Parses the OPF spine for correct reading order
- Extracts text from each XHTML content document using cheerio
- Falls back to all manifest items if no spine is found
- Wire epub case into processAndEmbedFile() switch
- Add jszip dependency for ZIP archive reading (cheerio already present)
Closes #253-adjacent (epub is a common format for Project Gutenberg
content and technical reference books)