POMDP Framework for LLM Agentic Search in Large Contexts

ai-technology · 2026-05-11

A recent preprint on arXiv (2605.07042) addresses the difficulties faced by large language model (LLM) agents in environments where the relevant state surpasses context windows. The authors introduce the Context Gathering Decision Process (CGDP), a tailored Partially Observable Markov Decision Process (POMDP). Within this structure, the agent's goal is to adaptively enhance its belief state to pinpoint essential information for specific tasks. The study characterizes LLM behavior as an approximation of Thompson Sampling in the CGDP and presents a predicate-based technique to break down an LLM's implicit search. This approach tackles challenges such as redundant efforts and premature termination in agentic searches across extensive codebases, enterprise databases, and conversational records.

Key facts

arXiv preprint 2605.07042 introduces Context Gathering Decision Process (CGDP)
CGDP is a specialized Partially Observable Markov Decision Process (POMDP)
Addresses LLM agents in environments with state exceeding context windows
Models LLM behavior as approximate Thompson Sampling within CGDP
Introduces predicate-based method for decomposing implicit search
Targets redundant work and premature stopping in agentic search
Applications include massive codebases, enterprise databases, conversational histories

POMDP Framework for LLM Agentic Search in Large Contexts

Key facts

Entities

Institutions

Sources