OLIVIA: Online Learning for LLM Agent Decision Making

other · 2026-05-13

The newly introduced OLIVIA framework (Online Learning via Inference-time Action Adaptation) tackles the challenge of action-selection mistakes in large language model (LLM) agents utilizing the ReAct approach. In environments where agents repeatedly perform multi-step tasks, minor errors can lead to unnecessary tool usage, increased latency, and diminished reliability. Current adaptation techniques at inference time depend on prompting or retrieval, which indirectly affect behavior by altering context and lack a clear decision-making layer for evaluating candidates, reflecting uncertainty, or adjusting based on action-level feedback. OLIVIA conceptualizes the LLM's final action choice as a trainable decision layer, facilitating precise, trackable, and uncertainty-aware adjustments during deployment. This framework is specifically tailored for ReAct agents and seeks to enhance efficiency and reliability without needing to retrain the core LLM. The research is accessible on arXiv with the identifier 2605.11169.

Key facts

OLIVIA stands for Online Learning via Inference-time Action Adaptation.
The framework targets ReAct-style LLM agents handling sequential decision-making tasks.
Small action-selection errors can accumulate into wasted tool calls, latency, and reduced reliability.
Existing inference-time adaptation methods rely on prompting or retrieval, not explicit decision layers.
OLIVIA models the LLM's final action selection as a learnable decision layer.
It enables online updates from action-level feedback during deployment.
The paper is published on arXiv with identifier 2605.11169.
OLIVIA does not require retraining of the underlying LLM.

OLIVIA: Online Learning for LLM Agent Decision Making

Key facts

Entities

Institutions

Sources