Dooly: Configuration-Agnostic Profiling for LLM Inference Simulation
A new system called Dooly, described in a preprint on arXiv (2605.07985), addresses the high cost of profiling large language model (LLM) inference configurations. Traditional profile-based simulators require re-profiling every operation from scratch for each configuration, making exploration expensive. Dooly exploits structural understanding: input dimensions are fixed by model configuration or request-dependent, and many configuration values (e.g., head size, layer count) recur across models. By performing a single inference pass and labeling operations, Dooly achieves configuration-agnostic, redundancy-aware profiling, enabling efficient simulation across hardware, serving engines, attention backends, and model architectures.
Key facts
- Dooly is a configuration-agnostic, redundancy-aware profiling system for LLM inference simulation.
- It is described in arXiv preprint 2605.07985.
- Traditional profile-based simulators hardcode operation sets and re-profile from scratch.
- Dooly performs a single inference pass and labels operations.
- It exploits the fact that many model-configuration values recur across models.
- Dooly enables efficient exploration of hardware, serving engines, attention backends, and model architectures.
Entities
Institutions
- arXiv