Many-Shot CoT-ICL Scaling Differs for Reasoning Tasks
A recent preprint on arXiv (2605.13511) questions current beliefs regarding many-shot in-context learning (ICL) for reasoning challenges. The researchers investigate many-shot chain-of-thought in-context learning (CoT-ICL) and discover that typical scaling principles from non-reasoning tasks do not apply. They note a setting-dependent scaling phenomenon across both reasoning-focused and non-reasoning large language models (LLMs): while increasing CoT demonstrations proves unstable for non-reasoning LLMs, it primarily benefits those designed for reasoning. Furthermore, while similarity-based retrieval is advantageous for non-reasoning tasks, it falls short in reasoning tasks, as semantic similarity does not effectively predict procedural (CoT) compatibility. This study emphasizes the importance of understanding ICL scaling in a task-specific manner.
Key facts
- arXiv:2605.13511
- Many-shot CoT-ICL studied for reasoning tasks
- Standard many-shot rules do not transfer to reasoning
- Setting-dependent scaling effect observed
- Increasing CoT demonstrations unstable for non-reasoning LLMs
- Similarity-based retrieval fails on reasoning tasks
- Semantic similarity poorly predicts CoT compatibility
- Non-reasoning and reasoning-oriented LLMs tested
Entities
Institutions
- arXiv