SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Scientific Task Formulation
A new benchmark, SCICONVBENCH, evaluates large language models on their ability to clarify ill-posed scientific requests through multi-turn dialogue. Unlike existing benchmarks that assume well-defined problems, SCICONVBENCH tests disambiguation and error detection across fluid mechanics, solid mechanics, materials science, and partial differential equations. The benchmark targets two capabilities: eliciting missing information and correcting internally contradictory user requests.
Key facts
- SCICONVBENCH is a benchmark for multi-turn clarification in scientific task formulation.
- It covers four domains: fluid mechanics, solid mechanics, materials science, and PDEs.
- The benchmark targets disambiguation and error detection.
- Existing benchmarks assume well-posed problems, but SCICONVBENCH addresses ill-posed user requests.
- The benchmark is introduced in arXiv paper 2605.18630.
Entities
—