ARTFEED — Contemporary Art Intelligence

Agent Runtime Layer Bridges LLM Frameworks and Serving Engines

other · 2026-05-28

A new arXiv paper (2605.27744) proposes inserting an agent runtime layer between multi-agent LLM frameworks and serving engines to handle cross-cutting policies. The authors argue that current systems suffer from a disconnect: the agent framework knows agent identities, roles, schemas, and dispatch structure but never sees engine-level events, while the serving engine sees every event but knows nothing about agents. This seam creates challenges for policies like prefix caching, batch shaping, speculative execution, fairness, tool-result memoization, and safety enforcement, which are currently addressed with one-off patches. The proposed solution introduces a third tier with four primitives—observe, score, predict, act—into which any agent-aware policy can plug, using agent identity as the shared coordinate. The paper maps nine concrete use cases to demonstrate the approach.

Key facts

  • arXiv:2605.27744 proposes an agent runtime layer for multi-agent LLM serving.
  • The layer sits between the agent framework and the serving engine.
  • It exposes four primitives: observe, score, predict, act.
  • Agent identity is the shared coordinate for policies.
  • Nine concrete use cases are mapped.
  • Current policies are implemented as one-off patches.
  • The paper argues for an architectural change rather than point fixes.
  • Policies include prefix caching, batch shaping, speculative execution, fairness, tool-result memoization, and safety enforcement.

Entities

Institutions

  • arXiv

Sources