Agent Runtime Layer Bridges LLM Frameworks and Serving Engines

other · 2026-05-28

A new arXiv paper (2605.27744) proposes inserting an agent runtime layer between multi-agent LLM frameworks and serving engines to handle cross-cutting policies. The authors argue that current systems suffer from a disconnect: the agent framework knows agent identities, roles, schemas, and dispatch structure but never sees engine-level events, while the serving engine sees every event but knows nothing about agents. This seam creates challenges for policies like prefix caching, batch shaping, speculative execution, fairness, tool-result memoization, and safety enforcement, which are currently addressed with one-off patches. The proposed solution introduces a third tier with four primitives—observe, score, predict, act—into which any agent-aware policy can plug, using agent identity as the shared coordinate. The paper maps nine concrete use cases to demonstrate the approach.

Key facts

arXiv:2605.27744 proposes an agent runtime layer for multi-agent LLM serving.
The layer sits between the agent framework and the serving engine.
It exposes four primitives: observe, score, predict, act.
Agent identity is the shared coordinate for policies.
Nine concrete use cases are mapped.
Current policies are implemented as one-off patches.
The paper argues for an architectural change rather than point fixes.
Policies include prefix caching, batch shaping, speculative execution, fairness, tool-result memoization, and safety enforcement.

Agent Runtime Layer Bridges LLM Frameworks and Serving Engines

Key facts

Entities

Institutions

Sources