Small Language Models Use Positional Shortcut for Arithmetic, Not Logic

ai-technology · 2026-05-25

A recent study published on arXiv (2605.22870) indicates that small language models (1-3B parameters) depend on a positional shortcut instead of logical reasoning when executing arithmetic tasks through chain-of-thought (CoT) prompting. Researchers evaluated three instruction-tuned models using the GSM8K dataset and discovered that during the answer-readout phase, the model simply replicates the number that appears last before the answer delimiter, ignoring any intermediate reasoning. The presence of gold answers contributes 54-92 percentage points to accuracy (89-92% of the teacher-forcing ceiling for each model). Notably, even with incorrect inputs, the final answer aligns with the last CoT number 95-96% of the time. This suggests that the role of CoT may not be about logical sequencing but rather a positional shortcut, challenging existing beliefs regarding reasoning in smaller language models.

Key facts

arXiv paper 2605.22870 examines chain-of-thought prompting in small language models.
Three 1-3B instruction-tuned LMs were tested on GSM8K arithmetic tasks.
Models copy the trailing number before the answer delimiter regardless of reasoning.
Gold-answer presence accounts for 54-92 percentage points of accuracy.
This represents 89-92% of each model's teacher-forcing ceiling.
Final answer matches last CoT number 95-96% of the time on incorrect items.
Replacing trailing number with wrong value collapses accuracy to near-zero.
Removing the trailing number recovers 5-32 percentage points above the floor.

Small Language Models Use Positional Shortcut for Arithmetic, Not Logic

Key facts

Entities

Institutions

Sources