AGORA: Inference-Free Prompt Compression for LLM Agents

ai-technology · 2026-05-27

AGORA, which stands for Adapter-Grounded Observation-Action Retention, is a new method that enables prompt compression without losing important context for LLM agents. Unlike traditional token-level compressors that often break down action grammar, AGORA operates at a step level. In tests with 17 different environment-backbone-method combinations, the token-level methods scored an average reward of 0.05 or less, despite achieving compressions between 1.3 and 13.3 times. AGORA uses a structural prompt parser and a relevance scorer with 125 million parameters, trained on specific labels, to maintain over 75% of original performance in 8 out of 9 scenarios. A four-way analysis showed that the structural floor is crucial for quality, while the learned scorer allows for significant adaptive compression.

Key facts

Token-level extractive compressors fail for LLM agents due to action-grammar destruction
Across 17 cells, token-level methods achieve mean reward ≤ 0.05 despite 1.3-13.3x compression
AGORA is an inference-free step-level compressor
AGORA uses a structural prompt parser, always-keep floor, and 125M-parameter relevance scorer
Relevance scorer trained on counterfactual next-action-change labels
AGORA runs at ~2ms per step with zero per-step LLM toll
AGORA retains ≥ 75% uncompressed performance in 8 of 9 cells
Structural floor is the dominant quality lever in ablation study
Learned scorer enables 1.0-11.5x adaptive end-to-end compression
Method published on arXiv under Computer Science > Artificial Intelligence

AGORA: Inference-Free Prompt Compression for LLM Agents

Key facts

Entities

Institutions

Sources