Agent-X Accelerates On-Device AI Agents by 1.61x
Agent-X is a software-only framework that accelerates LLM-based AI agents on edge devices by up to 1.61x with no accuracy loss. It optimizes both prefill and decode stages through prompt rewriting for prefix caching and LLM-free speculative decoding. The system is designed for seamless integration into existing on-device agents, addressing latency bottlenecks in real-world applications. This is the first systematic characterization of such bottlenecks.
Key facts
- Agent-X achieves 1.61x end-to-end speedup on representative agentic workloads.
- It is a software-only, accuracy-preserving framework.
- Accelerates both prefill and decode stages.
- Uses prompt rewriting for prefix caching tailored to agent-specific input patterns.
- Employs LLM-free speculative decoding for fast token generation.
- Can be seamlessly integrated into existing on-device AI agents.
- First to systematically characterize and eliminate latency bottlenecks in on-device agents.
- Targets edge devices with high end-to-end latency.
Entities
Institutions
- arXiv