Agent Runtime Layer Bridges LLM Frameworks and Serving Engines
A new arXiv paper (2605.27744) proposes inserting an agent runtime layer between multi-agent LLM frameworks and serving engines to handle cross-cutting policies. The authors argue that current systems suffer from a disconnect: the agent framework knows agent identities, roles, schemas, and dispatch structure but never sees engine-level events, while the serving engine sees every event but knows nothing about agents. This seam creates challenges for policies like prefix caching, batch shaping, speculative execution, fairness, tool-result memoization, and safety enforcement, which are currently addressed with one-off patches. The proposed solution introduces a third tier with four primitives—observe, score, predict, act—into which any agent-aware policy can plug, using agent identity as the shared coordinate. The paper maps nine concrete use cases to demonstrate the approach.
Key facts
- arXiv:2605.27744 proposes an agent runtime layer for multi-agent LLM serving.
- The layer sits between the agent framework and the serving engine.
- It exposes four primitives: observe, score, predict, act.
- Agent identity is the shared coordinate for policies.
- Nine concrete use cases are mapped.
- Current policies are implemented as one-off patches.
- The paper argues for an architectural change rather than point fixes.
- Policies include prefix caching, batch shaping, speculative execution, fairness, tool-result memoization, and safety enforcement.
Entities
Institutions
- arXiv