S1-Bench: Evaluating System 1 Thinking in Large Reasoning Models

other · 2026-05-04

A new standard, known as S1-Bench, has been launched to assess the system 1 thinking abilities of Large Reasoning Models (LRMs). System 1 thinking is characterized by quick, intuitive responses that require few tokens, differing from the extended reasoning processes that LRMs usually utilize. This benchmark spans multiple domains and languages, featuring straightforward system 1 questions. Evaluations conducted on 28 LRMs indicated a lack of accuracy and efficiency in addressing these queries. Current efficient reasoning techniques either struggle to adapt to simple questions or compromise performance for speed. Additionally, the findings revealed that LRMs demonstrate an early awareness of problem difficulty with reduced confidence, with such difficulty being subtly represented in hidden states. This research underscores the significance of system 1 thinking in practical applications, highlighting the models' awareness of challenges and reasoning efficiency.

Key facts

S1-Bench is a multi-domain, multilingual benchmark for system 1 thinking.
28 Large Reasoning Models were tested.
LRMs showed under-accuracy and inefficiency on system 1 problems.
Existing efficient reasoning methods generalize poorly or sacrifice performance.
LRMs exhibit early difficulty awareness with lower confidence.
Problem difficulty is implicitly encoded in hidden states.
System 1 thinking is essential for real-world applications.
The research explores the underexplored system 1 capability of LRMs.

Entities

—

Sources

arXiv cs.AI — 2026-05-04