ObjectCache: KV Cache in S3-Compatible Object Storage for LLMs

other · 2026-05-25

The newly introduced ObjectCache system, detailed in arXiv:2605.22850, utilizes S3-compatible object storage for large language model (LLM) KV caches instead of costly remote DRAM pools. This innovative strategy seeks to cut down both the size and expenses of serving clusters while keeping the time to first token (TTFT) impact minimal. It integrates the design of the storage protocol and transfer schedule, ensuring that KV cache data is provided in the sequence needed by the GPU, facilitating simultaneous data transfer and computation across multiple requests. A prototype was developed on a 100 Gbps RoCE cluster utilizing NIXL, an inference library that simplifies storage and memory management. The paper presents a viable alternative to existing prefix KV caching techniques that depend on remote DRAM due to limitations in GPU and local DRAM.

Key facts

ObjectCache stores KV cache in S3-compatible object storage
Aims to reduce serving-cluster size and cost
Minimizes impact on time to first token (TTFT)
Co-designs storage protocol and transfer schedule
Delivers KV cache data in GPU consumption order
Overlaps data transfer with compute across concurrent requests
Prototype built on 100 Gbps RoCE cluster with NIXL
Paper published on arXiv with ID 2605.22850

ObjectCache: KV Cache in S3-Compatible Object Storage for LLMs

Key facts

Entities

Institutions

Sources