Many-Shot CoT-ICL Scaling Differs for Reasoning Tasks

ai-technology · 2026-05-14

A recent preprint on arXiv (2605.13511) questions current beliefs regarding many-shot in-context learning (ICL) for reasoning challenges. The researchers investigate many-shot chain-of-thought in-context learning (CoT-ICL) and discover that typical scaling principles from non-reasoning tasks do not apply. They note a setting-dependent scaling phenomenon across both reasoning-focused and non-reasoning large language models (LLMs): while increasing CoT demonstrations proves unstable for non-reasoning LLMs, it primarily benefits those designed for reasoning. Furthermore, while similarity-based retrieval is advantageous for non-reasoning tasks, it falls short in reasoning tasks, as semantic similarity does not effectively predict procedural (CoT) compatibility. This study emphasizes the importance of understanding ICL scaling in a task-specific manner.

Key facts

arXiv:2605.13511
Many-shot CoT-ICL studied for reasoning tasks
Standard many-shot rules do not transfer to reasoning
Setting-dependent scaling effect observed
Increasing CoT demonstrations unstable for non-reasoning LLMs
Similarity-based retrieval fails on reasoning tasks
Semantic similarity poorly predicts CoT compatibility
Non-reasoning and reasoning-oriented LLMs tested

Many-Shot CoT-ICL Scaling Differs for Reasoning Tasks

Key facts

Entities

Institutions

Sources