LLMs Struggle with Long-Chain Reasoning on Equivalence Class Problem
A new empirical study on arXiv evaluates Large Language Models (LLMs) on the Equivalence Class Problem (ECP), a simple but long-chain reasoning task. The study tests both reasoning and non-reasoning models across varying variables, connectivity probabilities, and prompts. Non-reasoning LLMs fail ECP entirely, while reasoning models perform significantly better but still cannot fully solve it. For non-reasoning models, performance drops as connectivity probability increases with fixed variables. The paper highlights fundamental limitations in LLM reasoning capabilities.
Key facts
- Study evaluates LLMs on Equivalence Class Problem (ECP)
- ECP determines if two variables are equal given equivalence relations
- Non-reasoning LLMs fail ECP
- Reasoning models are better but struggle to completely solve ECP
- Performance varies with connectivity probability and number of variables
- Non-reasoning models show performance drop with higher connectivity probability
- Study appears on arXiv as 2605.06882
- Focus on simplest long-chain reasoning tasks
Entities
Institutions
- arXiv