Faithfulness-QA Dataset Trains RAG Models to Prefer Context Over Memory
Researchers have released Faithfulness-QA, a large-scale dataset of 99,094 samples designed to train Retrieval-Augmented Generation (RAG) models to prioritize retrieved context over parametric memory. The dataset addresses a core flaw in RAG systems, which often generate answers from internal knowledge rather than provided context. It was constructed by counterfactual entity substitution: from SQuAD and TriviaQA benchmarks, answer-bearing named entities were replaced with type-consistent alternatives from a curated bank of 76,953 entities, creating controlled knowledge conflicts. Rigorous quality filtering ensures 100% pass rates on automated checks. The full dataset is available on arXiv.
Key facts
- Faithfulness-QA is a dataset of 99,094 samples for training RAG models.
- It uses counterfactual entity substitution to create knowledge conflicts.
- Derived from SQuAD and TriviaQA benchmarks.
- Entity bank contains 76,953 type-consistent alternatives.
- Quality filtering ensures 100% pass rates on automated audits.
- Aims to reduce unfaithful answers from parametric memory.
- Released on arXiv under identifier 2604.25313.
- Addresses a fundamental obstacle in retrieval augmentation.
Entities
Institutions
- arXiv