ARTFEED — Contemporary Art Intelligence

LLMs Struggle with Long-Chain Reasoning on Equivalence Class Problem

ai-technology · 2026-05-11

A new empirical study on arXiv evaluates Large Language Models (LLMs) on the Equivalence Class Problem (ECP), a simple but long-chain reasoning task. The study tests both reasoning and non-reasoning models across varying variables, connectivity probabilities, and prompts. Non-reasoning LLMs fail ECP entirely, while reasoning models perform significantly better but still cannot fully solve it. For non-reasoning models, performance drops as connectivity probability increases with fixed variables. The paper highlights fundamental limitations in LLM reasoning capabilities.

Key facts

  • Study evaluates LLMs on Equivalence Class Problem (ECP)
  • ECP determines if two variables are equal given equivalence relations
  • Non-reasoning LLMs fail ECP
  • Reasoning models are better but struggle to completely solve ECP
  • Performance varies with connectivity probability and number of variables
  • Non-reasoning models show performance drop with higher connectivity probability
  • Study appears on arXiv as 2605.06882
  • Focus on simplest long-chain reasoning tasks

Entities

Institutions

  • arXiv

Sources