LAWS: Self-Certifying Cache Architecture for Neural Inference
A new framework known as Learning from Actual Workloads Symbolically (LAWS) has been developed for self-certifying inference caching. This innovative system accumulates expert functions using real-world deployment information, with each function corresponding to a specific input space area, mapped as nodes in a Probabilistic Language Trie (PLT). LAWS features a self-certification theorem, ensuring that approximation errors remain within defined limits. It also utilizes Mixture-of-Experts and KV prefix caching, allowing for enhanced adaptability compared to traditional setups. Furthermore, the framework presents a theoretical foundation regarding monotone hit rates, contributing to its overall efficiency and performance.
Key facts
- LAWS stands for Learning from Actual Workloads Symbolically.
- It is a self-certifying inference caching architecture.
- Each expert covers a region defined by a node in the Probabilistic Language Trie (PLT).
- The self-certification theorem bounds error by epsilon_fit + 2*Lambda(W)*C_E.
- Lambda(W) is the model Lipschitz constant.
- C_E is the maximum embedding diameter.
- LAWS generalizes Mixture-of-Experts and KV prefix caching.
- It is strictly more expressive than any fixed-K MoE or finite cache.
Entities
Institutions
- arXiv