POMDP Framework for LLM Agentic Search in Large Contexts
A recent preprint on arXiv (2605.07042) addresses the difficulties faced by large language model (LLM) agents in environments where the relevant state surpasses context windows. The authors introduce the Context Gathering Decision Process (CGDP), a tailored Partially Observable Markov Decision Process (POMDP). Within this structure, the agent's goal is to adaptively enhance its belief state to pinpoint essential information for specific tasks. The study characterizes LLM behavior as an approximation of Thompson Sampling in the CGDP and presents a predicate-based technique to break down an LLM's implicit search. This approach tackles challenges such as redundant efforts and premature termination in agentic searches across extensive codebases, enterprise databases, and conversational records.
Key facts
- arXiv preprint 2605.07042 introduces Context Gathering Decision Process (CGDP)
- CGDP is a specialized Partially Observable Markov Decision Process (POMDP)
- Addresses LLM agents in environments with state exceeding context windows
- Models LLM behavior as approximate Thompson Sampling within CGDP
- Introduces predicate-based method for decomposing implicit search
- Targets redundant work and premature stopping in agentic search
- Applications include massive codebases, enterprise databases, conversational histories
Entities
Institutions
- arXiv