ARTFEED — Contemporary Art Intelligence

QuestBench: Teaching AI Literacy Through Benchmark Construction

ai-technology · 2026-05-22

A new educational practice teaches students to construct benchmarks for testing AI systems, using deep research tools as a case study. The approach, introduced in a paper on arXiv, shifts AI education from productivity training to critical evaluation. Students create expert-level questions in humanities and social sciences, peer-review for ambiguity, and assess AI responses. The resulting benchmark, QuestBench, includes 256 questions across 14 domains. This method aims to help students understand their role in judging machine-produced knowledge.

Key facts

  • Practice involves students constructing benchmarks to test AI systems
  • Uses deep research systems as a concrete example
  • Students create expert-level questions from disciplinary knowledge
  • Peer review focuses on ambiguity and shortcuts
  • Resulting benchmark QuestBench has 256 questions
  • Covers 14 humanities and social-science domains
  • Aims to teach critical evaluation of AI outputs
  • Published on arXiv with ID 2605.21413

Entities

Institutions

  • arXiv

Sources