Faithfulness-QA Dataset Trains RAG Models to Prefer Context Over Memory

ai-technology · 2026-04-30

Researchers have released Faithfulness-QA, a large-scale dataset of 99,094 samples designed to train Retrieval-Augmented Generation (RAG) models to prioritize retrieved context over parametric memory. The dataset addresses a core flaw in RAG systems, which often generate answers from internal knowledge rather than provided context. It was constructed by counterfactual entity substitution: from SQuAD and TriviaQA benchmarks, answer-bearing named entities were replaced with type-consistent alternatives from a curated bank of 76,953 entities, creating controlled knowledge conflicts. Rigorous quality filtering ensures 100% pass rates on automated checks. The full dataset is available on arXiv.

Key facts

Faithfulness-QA is a dataset of 99,094 samples for training RAG models.
It uses counterfactual entity substitution to create knowledge conflicts.
Derived from SQuAD and TriviaQA benchmarks.
Entity bank contains 76,953 type-consistent alternatives.
Quality filtering ensures 100% pass rates on automated audits.
Aims to reduce unfaithful answers from parametric memory.
Released on arXiv under identifier 2604.25313.
Addresses a fundamental obstacle in retrieval augmentation.

Faithfulness-QA Dataset Trains RAG Models to Prefer Context Over Memory

Key facts

Entities

Institutions

Sources