LLMs Struggle with Long-Chain Reasoning on Equivalence Class Problem

ai-technology · 2026-05-11

A new empirical study on arXiv evaluates Large Language Models (LLMs) on the Equivalence Class Problem (ECP), a simple but long-chain reasoning task. The study tests both reasoning and non-reasoning models across varying variables, connectivity probabilities, and prompts. Non-reasoning LLMs fail ECP entirely, while reasoning models perform significantly better but still cannot fully solve it. For non-reasoning models, performance drops as connectivity probability increases with fixed variables. The paper highlights fundamental limitations in LLM reasoning capabilities.

Key facts

Study evaluates LLMs on Equivalence Class Problem (ECP)
ECP determines if two variables are equal given equivalence relations
Non-reasoning LLMs fail ECP
Reasoning models are better but struggle to completely solve ECP
Performance varies with connectivity probability and number of variables
Non-reasoning models show performance drop with higher connectivity probability
Study appears on arXiv as 2605.06882
Focus on simplest long-chain reasoning tasks

LLMs Struggle with Long-Chain Reasoning on Equivalence Class Problem

Key facts

Entities

Institutions

Sources