QuestBench: Teaching AI Literacy Through Benchmark Construction
A new educational practice teaches students to construct benchmarks for testing AI systems, using deep research tools as a case study. The approach, introduced in a paper on arXiv, shifts AI education from productivity training to critical evaluation. Students create expert-level questions in humanities and social sciences, peer-review for ambiguity, and assess AI responses. The resulting benchmark, QuestBench, includes 256 questions across 14 domains. This method aims to help students understand their role in judging machine-produced knowledge.
Key facts
- Practice involves students constructing benchmarks to test AI systems
- Uses deep research systems as a concrete example
- Students create expert-level questions from disciplinary knowledge
- Peer review focuses on ambiguity and shortcuts
- Resulting benchmark QuestBench has 256 questions
- Covers 14 humanities and social-science domains
- Aims to teach critical evaluation of AI outputs
- Published on arXiv with ID 2605.21413
Entities
Institutions
- arXiv