LLMs Show More Human-Like Exploration-Exploitation with Thinking Traces

ai-technology · 2026-05-04

A new study on arXiv compares exploration-exploitation strategies of large language models (LLMs), humans, and multi-armed bandit (MAB) algorithms using canonical experiments from cognitive science and psychiatry. The research finds that enabling thinking traces in LLMs—through prompting strategies and thinking models—shifts their decision-making behavior toward more human-like patterns. The study employs interpretable choice models to capture the E&E strategies of each agent, highlighting how LLMs can simulate human sequential decision-making under uncertainty.

Key facts

Study compares LLMs, humans, and MAB algorithms on exploration-exploitation tradeoff.
Uses canonical multi-armed bandit experiments from cognitive science and psychiatry.
Enabling thinking traces in LLMs shifts behavior toward more human-like.
Interpretable choice models capture E&E strategies of agents.
Research appears on arXiv with ID 2505.09901.
Thinking traces enabled via prompting strategies and thinking models.
Focus on dynamic decision-making under uncertainty.
LLMs increasingly used to simulate or automate human behavior.

LLMs Show More Human-Like Exploration-Exploitation with Thinking Traces

Key facts

Entities

Institutions

Sources