GRIEF Fuzzer Uncovers 15 Vulnerabilities in LLM Serving Systems
A team of researchers has introduced GRIEF, a greybox fuzzer designed for LLM inference engines that identifies vulnerabilities linked to shared-state behavior in serving layers. Unlike traditional evaluations that concentrate on model safety or API correctness, GRIEF regards timed multi-request traces as primary inputs, employing lightweight oracles to uncover crashes, hangs, performance anomalies, and silent output corruption. Reproducible failures are validated through controlled replay with log-probability checks. In initial tests on vLLM and SGLang, GRIEF identified 15 vulnerabilities, with 10 verified by engine developers, including 2 CVEs. These vulnerabilities encompass issues related to KV-cache, batching, prefix sharing, speculative decoding, adapters, and multi-tenant scheduling, underscoring the security-critical aspects of LLM serving infrastructure under realistic concurrent workloads.
Key facts
- GRIEF is a greybox fuzzer for LLM inference engines.
- It targets vulnerabilities in the serving layer, not model behavior.
- It uses timed multi-request traces as first-class inputs.
- Lightweight oracles detect crashes, hangs, performance pathologies, and silent output corruption.
- Controlled replay with log-probability checks confirms reproducible failures.
- Early campaigns on vLLM and SGLang discovered 15 vulnerabilities.
- 10 vulnerabilities were confirmed by engine developers.
- 2 CVEs were included among the confirmed vulnerabilities.
- Vulnerabilities span KV-cache, batching, prefix sharing, speculative decoding, adapters, and multi-tenant scheduling.
- The work underscores the security-critical nature of LLM serving systems.
Entities
Institutions
- arXiv