Commit Graph

13 Commits

Author SHA1 Message Date
brian
dc7abfd41a feat(rag): add EPUB file support for Knowledge Base uploads
EPUBs are ZIP archives containing structured XHTML content with semantic
chapter/section markup, making them well-suited for RAG text extraction
and chunking.

Changes:
- Add 'epub' to determineFileType() in utils/fs.ts
- Add processEPUBFile() method in rag_service.ts that:
  - Reads container.xml to locate the OPF manifest
  - Parses the OPF spine for correct reading order
  - Extracts text from each XHTML content document using cheerio
  - Falls back to all manifest items if no spine is found
- Wire epub case into processAndEmbedFile() switch
- Add jszip dependency for ZIP archive reading (cheerio already present)

Closes #253-adjacent (epub is a common format for Project Gutenberg
content and technical reference books)
2026-03-13 13:25:01 -04:00
Jake Turner
58b106f388 feat: support for updating services 2026-03-11 14:08:09 -07:00
Jake Turner
8726700a0a feat: zim content embedding 2026-02-08 13:20:10 -08:00
Jake Turner
1923cd4cde feat(AI): chat suggestions and assistant settings 2026-02-01 07:24:21 +00:00
Jake Turner
243f749090 feat: [wip] native AI chat interface 2026-01-31 20:39:49 -08:00
Jake Turner
50174d2edb feat(RAG): [wip] RAG capabilities 2026-01-31 20:39:49 -08:00
Jake Turner
9ec514e145
fix(Zim): storage path 2025-12-07 20:18:58 -08:00
Jake Turner
5205d5909d
feat: disk info collection 2025-12-07 19:13:43 -08:00
Jake Turner
2ff7b055b5
fix(Kiwix): initial download and setup 2025-12-07 16:04:41 -08:00
Jake Turner
7569aa935d
feat: background job overhaul with bullmq 2025-12-06 23:59:01 -08:00
Jake Turner
95ba0a95c9 fix: download util improvements 2025-12-05 18:16:23 -08:00
Jake Turner
dd4e7c2c4f feat: curated zim collections 2025-12-05 15:47:22 -08:00
Jake Turner
12a6f2230d
feat: [wip] new maps system 2025-11-30 22:29:16 -08:00