ARTFEED — Contemporary Art Intelligence

FakeWiki Benchmark for Language Model Provenance

other · 2026-05-09

A new arXiv paper introduces DataDignity, a framework for training data attribution in large language models. The authors propose pinpoint provenance, a task to identify which source document supports a model's response. They created FakeWiki, a benchmark of 3,537 fabricated Wikipedia-style articles with ground-truth provenance. The benchmark includes QA probes, paraphrases, retro-generated variants, and hard anti-documents. Five query conditions are tested: clean prompting and four jailbreak-inspired transformations. The study evaluates seven retrieval baselines, a training-free method called SteerFuse, and a supervised contrastive ranker, ScoringModel.

Key facts

  • DataDignity addresses training data attribution for LLMs.
  • Pinpoint provenance ranks documents supporting a model response.
  • FakeWiki contains 3,537 fabricated Wikipedia-style articles.
  • FakeWiki includes QA probes, paraphrases, retro-generated variants, and hard anti-documents.
  • Five query conditions: clean prompting and four jailbreak-inspired transformations.
  • Seven retrieval baselines evaluated.
  • SteerFuse is a training-free activation-steering retrieval-fusion method.
  • ScoringModel is a supervised contrastive provenance ranker.

Entities

Institutions

  • arXiv

Sources