VerbatimRAG: Hallucination-Free QA for Research Papers
Researchers have developed VerbatimRAG, an extractive question answering system that eliminates hallucinations in AI-assisted research by mapping user queries directly to verbatim text spans in retrieved documents. The system is applied to the ACL Anthology and uses a novel ground truth dataset created via the ScIRGen methodology, with human annotation by NLP researchers. A 150M-parameter ModernBERT model is trained and evaluated on this benchmark. The approach addresses the tendency of LLMs to produce factually inaccurate output, providing a reliable method for collecting high-quality information from trusted sources.
Key facts
- VerbatimRAG is an extractive QA system for research papers.
- It maps user queries to verbatim text spans in retrieved documents.
- Applied to the ACL Anthology.
- Uses a novel ground truth dataset based on synthetic queries and ScIRGen methodology.
- Human annotation performed by NLP researchers.
- A 150M-parameter ModernBERT model is trained and evaluated.
- Addresses LLM hallucination problem in research.
- arXiv paper ID: 2605.21102.
Entities
Institutions
- ACL Anthology
- arXiv