STLGT: Scalable Linear Graph Transformer for Microservice Tail Latency

other · 2026-04-30

STLGT (Scalable Trace-based Linear Graph Transformer) serves as a per-API predictor designed for forecasting multi-step p95 tail latency in microservice architectures. It transforms traces into span graphs and employs a structure-aware linear graph Transformer to effectively manage cross-service dependencies, maintaining an inference time that scales linearly with the span graph size. Additionally, it features a decoupled temporal module to account for workload fluctuations. Evaluated using the DeathStarBench personalized education microservice and Alibaba traces, STLGT enhances forecasting precision compared to PERT-GNN by an average of 8.5% MAPE and provides CPU inference speeds up to 12 times faster at N=32, aligning with the largest span graph size post-Alibaba trace preprocessing. Component analysis validates its effectiveness.

Key facts

STLGT is a per-API predictor for p95 tail-latency forecasting.
It encodes traces as span graphs.
Uses a structure-aware linear graph Transformer for linear inference time.
Includes a decoupled temporal module for workload dynamics.
Tested on personalized education microservice, DeathStarBench, and Alibaba traces.
Improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average.
Achieves up to 12x faster CPU inference at N=32.
N=32 matches maximum span graph size after preprocessing Alibaba traces.

Entities

—

Sources

arXiv cs.AI — 2026-04-30