Small Language Models Use Positional Shortcut for Arithmetic, Not Logic
A recent study published on arXiv (2605.22870) indicates that small language models (1-3B parameters) depend on a positional shortcut instead of logical reasoning when executing arithmetic tasks through chain-of-thought (CoT) prompting. Researchers evaluated three instruction-tuned models using the GSM8K dataset and discovered that during the answer-readout phase, the model simply replicates the number that appears last before the answer delimiter, ignoring any intermediate reasoning. The presence of gold answers contributes 54-92 percentage points to accuracy (89-92% of the teacher-forcing ceiling for each model). Notably, even with incorrect inputs, the final answer aligns with the last CoT number 95-96% of the time. This suggests that the role of CoT may not be about logical sequencing but rather a positional shortcut, challenging existing beliefs regarding reasoning in smaller language models.
Key facts
- arXiv paper 2605.22870 examines chain-of-thought prompting in small language models.
- Three 1-3B instruction-tuned LMs were tested on GSM8K arithmetic tasks.
- Models copy the trailing number before the answer delimiter regardless of reasoning.
- Gold-answer presence accounts for 54-92 percentage points of accuracy.
- This represents 89-92% of each model's teacher-forcing ceiling.
- Final answer matches last CoT number 95-96% of the time on incorrect items.
- Replacing trailing number with wrong value collapses accuracy to near-zero.
- Removing the trailing number recovers 5-32 percentage points above the floor.
Entities
Institutions
- arXiv