SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Scientific Task Formulation

ai-technology · 2026-05-20

A new benchmark, SCICONVBENCH, evaluates large language models on their ability to clarify ill-posed scientific requests through multi-turn dialogue. Unlike existing benchmarks that assume well-defined problems, SCICONVBENCH tests disambiguation and error detection across fluid mechanics, solid mechanics, materials science, and partial differential equations. The benchmark targets two capabilities: eliciting missing information and correcting internally contradictory user requests.

Key facts

SCICONVBENCH is a benchmark for multi-turn clarification in scientific task formulation.
It covers four domains: fluid mechanics, solid mechanics, materials science, and PDEs.
The benchmark targets disambiguation and error detection.
Existing benchmarks assume well-posed problems, but SCICONVBENCH addresses ill-posed user requests.
The benchmark is introduced in arXiv paper 2605.18630.

Entities

—

Sources

arXiv cs.AI — 2026-05-19