mirror of
https://github.com/Crosstalk-Solutions/project-nomad.git
synced 2026-03-28 03:29:25 +01:00
EPUBs are ZIP archives containing structured XHTML content with semantic chapter/section markup, making them well-suited for RAG text extraction and chunking. Changes: - Add 'epub' to determineFileType() in utils/fs.ts - Add processEPUBFile() method in rag_service.ts that: - Reads container.xml to locate the OPF manifest - Parses the OPF spine for correct reading order - Extracts text from each XHTML content document using cheerio - Falls back to all manifest items if no spine is found - Wire epub case into processAndEmbedFile() switch - Add jszip dependency for ZIP archive reading (cheerio already present) Closes #253-adjacent (epub is a common format for Project Gutenberg content and technical reference books) |
||
|---|---|---|
| .. | ||
| app | ||
| bin | ||
| commands | ||
| config | ||
| constants | ||
| database | ||
| docs | ||
| inertia | ||
| providers | ||
| public | ||
| resources/views | ||
| start | ||
| tests | ||
| types | ||
| util | ||
| views | ||
| .editorconfig | ||
| .env.example | ||
| ace.js | ||
| adonisrc.ts | ||
| eslint.config.js | ||
| package-lock.json | ||
| package.json | ||
| tailwind.config.ts | ||
| tsconfig.json | ||
| vite.config.ts | ||