PRAXIS: AI-Driven Tool Diagnoses Cloud Incidents 6x Faster
Researchers introduced PRAXIS, an orchestrator that uses LLM-driven structured traversal over service dependency and program dependence graphs to diagnose cloud incidents. It improves root-cause analysis accuracy by up to 6.3x over ReAct baselines while reducing token consumption by 5.3x. The system is demonstrated on 30 real-world incidents being compiled into a benchmark.
Key facts
- Unresolved production cloud incidents cost an average of over $2M per hour.
- PRAXIS is an orchestrator that manages and deploys an agentic workflow for diagnosing code- and configuration-caused cloud incidents.
- PRAXIS employs an LLM-driven structured traversal over two types of graph: a service dependency graph (SDG) and a hammock-block program dependence graph (PDG).
- SDG captures microservice-level dependencies.
- PDG captures code-level dependencies for each microservice.
- Compared to state-of-the-art ReAct baselines, PRAXIS improves RCA accuracy by up to 6.3x.
- PRAXIS reduces token consumption by 5.3x.
- PRAXIS is demonstrated on a set of 30 comprehensive real-world incidents being compiled into an RCA benchmark.
Entities
Institutions
- arXiv