LLMs Show More Human-Like Exploration-Exploitation with Thinking Traces
A new study on arXiv compares exploration-exploitation strategies of large language models (LLMs), humans, and multi-armed bandit (MAB) algorithms using canonical experiments from cognitive science and psychiatry. The research finds that enabling thinking traces in LLMs—through prompting strategies and thinking models—shifts their decision-making behavior toward more human-like patterns. The study employs interpretable choice models to capture the E&E strategies of each agent, highlighting how LLMs can simulate human sequential decision-making under uncertainty.
Key facts
- Study compares LLMs, humans, and MAB algorithms on exploration-exploitation tradeoff.
- Uses canonical multi-armed bandit experiments from cognitive science and psychiatry.
- Enabling thinking traces in LLMs shifts behavior toward more human-like.
- Interpretable choice models capture E&E strategies of agents.
- Research appears on arXiv with ID 2505.09901.
- Thinking traces enabled via prompting strategies and thinking models.
- Focus on dynamic decision-making under uncertainty.
- LLMs increasingly used to simulate or automate human behavior.
Entities
Institutions
- arXiv