ARTFEED — Contemporary Art Intelligence

Agent-X Accelerates On-Device AI Agents by 1.61x

ai-technology · 2026-05-12

Agent-X is a software-only framework that accelerates LLM-based AI agents on edge devices by up to 1.61x with no accuracy loss. It optimizes both prefill and decode stages through prompt rewriting for prefix caching and LLM-free speculative decoding. The system is designed for seamless integration into existing on-device agents, addressing latency bottlenecks in real-world applications. This is the first systematic characterization of such bottlenecks.

Key facts

  • Agent-X achieves 1.61x end-to-end speedup on representative agentic workloads.
  • It is a software-only, accuracy-preserving framework.
  • Accelerates both prefill and decode stages.
  • Uses prompt rewriting for prefix caching tailored to agent-specific input patterns.
  • Employs LLM-free speculative decoding for fast token generation.
  • Can be seamlessly integrated into existing on-device AI agents.
  • First to systematically characterize and eliminate latency bottlenecks in on-device agents.
  • Targets edge devices with high end-to-end latency.

Entities

Institutions

  • arXiv

Sources