LitSeg: Narrative-Aware Segmentation for Literary RAG
A new framework called LitSeg addresses the underexplored step of document segmentation in Retrieval-Augmented Generation (RAG) for literary works. Existing segmentation strategies are semantically blind and overlook narrative structures, causing fragmented plots and unclear references that hinder retrieval and generation. LitSeg uses multi-stage prompting to extract events, untangle narrative threads, clarify structures, and locate turning points. A lightweight variant, LitSeg-Lite, is fine-tuned as a single-pass chunker to reduce computational overhead. The work is published on arXiv (2605.27156) and targets improving RAG for long-tail domains like literature.
Key facts
- LitSeg is a narrative-theory-guided segmentation framework for literary RAG.
- Existing segmentation strategies are semantically blind and overlook narrative structures.
- LitSeg uses multi-stage prompting to extract events, untangle threads, and locate turning points.
- LitSeg-Lite is a lightweight single-pass chunker fine-tuned to reduce computational overhead.
- The paper is published on arXiv with ID 2605.27156.
- RAG enhances LLMs by incorporating external knowledge for long-tail domains.
- The critical step of document segmentation in RAG remains underexplored.
- Fragmented plots and unclear references hinder retrieval and generation performance.
Entities
Institutions
- arXiv