STLGT: Scalable Linear Graph Transformer for Microservice Tail Latency
STLGT (Scalable Trace-based Linear Graph Transformer) serves as a per-API predictor designed for forecasting multi-step p95 tail latency in microservice architectures. It transforms traces into span graphs and employs a structure-aware linear graph Transformer to effectively manage cross-service dependencies, maintaining an inference time that scales linearly with the span graph size. Additionally, it features a decoupled temporal module to account for workload fluctuations. Evaluated using the DeathStarBench personalized education microservice and Alibaba traces, STLGT enhances forecasting precision compared to PERT-GNN by an average of 8.5% MAPE and provides CPU inference speeds up to 12 times faster at N=32, aligning with the largest span graph size post-Alibaba trace preprocessing. Component analysis validates its effectiveness.
Key facts
- STLGT is a per-API predictor for p95 tail-latency forecasting.
- It encodes traces as span graphs.
- Uses a structure-aware linear graph Transformer for linear inference time.
- Includes a decoupled temporal module for workload dynamics.
- Tested on personalized education microservice, DeathStarBench, and Alibaba traces.
- Improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average.
- Achieves up to 12x faster CPU inference at N=32.
- N=32 matches maximum span graph size after preprocessing Alibaba traces.
Entities
—