ARTFEED — Contemporary Art Intelligence

LLMs Show More Human-Like Exploration-Exploitation with Thinking Traces

ai-technology · 2026-05-04

A new study on arXiv compares exploration-exploitation strategies of large language models (LLMs), humans, and multi-armed bandit (MAB) algorithms using canonical experiments from cognitive science and psychiatry. The research finds that enabling thinking traces in LLMs—through prompting strategies and thinking models—shifts their decision-making behavior toward more human-like patterns. The study employs interpretable choice models to capture the E&E strategies of each agent, highlighting how LLMs can simulate human sequential decision-making under uncertainty.

Key facts

  • Study compares LLMs, humans, and MAB algorithms on exploration-exploitation tradeoff.
  • Uses canonical multi-armed bandit experiments from cognitive science and psychiatry.
  • Enabling thinking traces in LLMs shifts behavior toward more human-like.
  • Interpretable choice models capture E&E strategies of agents.
  • Research appears on arXiv with ID 2505.09901.
  • Thinking traces enabled via prompting strategies and thinking models.
  • Focus on dynamic decision-making under uncertainty.
  • LLMs increasingly used to simulate or automate human behavior.

Entities

Institutions

  • arXiv

Sources