Optimizing Latency, Reliability, and Cost in LLM-Enabled Agentic Workflows
A recent study published on arXiv investigates the balance between latency, reliability, and cost in AI systems that consist of multiple interacting agents, including those driven by large language models (LLMs) and traditional computational units. The research presents performance models for both LLM and non-LLM agents, illustrating how computational effort correlates with output quality. It employs a parametric exponential reliability function for LLM agents to factor in reasoning and output tokens. The authors explore the creation of sequential workflows while adhering to latency and cost limitations, developing a water-filling token allocation strategy and defining optimal workflow reliability through shadow prices. This research seeks to enhance the reliability of LLM-integrated agent workflows.
Key facts
- Paper analyzes latency-reliability-cost tradeoffs in LLM-enabled agentic workflows.
- Introduces performance models for LLM and non-LLM agents.
- Uses parametric exponential reliability function for LLM agents.
- Derives water-filling token allocation policy.
- Characterizes optimal workflow reliability via shadow prices.
- Published on arXiv.
- Focuses on sequential workflows under constraints.
- Aims to improve reliability of multi-agent AI systems.
Entities
Institutions
- arXiv