ARTFEED — Contemporary Art Intelligence

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Scientific Task Formulation

ai-technology · 2026-05-20

A new benchmark, SCICONVBENCH, evaluates large language models on their ability to clarify ill-posed scientific requests through multi-turn dialogue. Unlike existing benchmarks that assume well-defined problems, SCICONVBENCH tests disambiguation and error detection across fluid mechanics, solid mechanics, materials science, and partial differential equations. The benchmark targets two capabilities: eliciting missing information and correcting internally contradictory user requests.

Key facts

  • SCICONVBENCH is a benchmark for multi-turn clarification in scientific task formulation.
  • It covers four domains: fluid mechanics, solid mechanics, materials science, and PDEs.
  • The benchmark targets disambiguation and error detection.
  • Existing benchmarks assume well-posed problems, but SCICONVBENCH addresses ill-posed user requests.
  • The benchmark is introduced in arXiv paper 2605.18630.

Entities

Sources