EPUBs are ZIP archives containing structured XHTML content with semantic
chapter/section markup, making them well-suited for RAG text extraction
and chunking.
Changes:
- Add 'epub' to determineFileType() in utils/fs.ts
- Add processEPUBFile() method in rag_service.ts that:
- Reads container.xml to locate the OPF manifest
- Parses the OPF spine for correct reading order
- Extracts text from each XHTML content document using cheerio
- Falls back to all manifest items if no spine is found
- Wire epub case into processAndEmbedFile() switch
- Add jszip dependency for ZIP archive reading (cheerio already present)
Closes #253-adjacent (epub is a common format for Project Gutenberg
content and technical reference books)