ARTFEED — Contemporary Art Intelligence

OCTOPUS: Optimized KV Cache Compression via Octahedral Parametrization

other · 2026-05-22

A novel technique named OCTOPUS enhances the compression of key-value (KV) caches for transformers during long-context autoregressive inference. This approach builds upon earlier rotation-preconditioned codecs like TurboQuant and PolarQuant by jointly quantizing rotated coordinate triplets. The direction of each triplet is represented as a square through octahedral parameterization, followed by Lloyd-Max quantization of two coordinates along with the triplet norm. This process allows for non-uniform bit allocation that relies solely on key dimensionality, achieving optimal squared error. The codec operates in a data-oblivious, online, and deterministic manner. The findings are available in a paper on arXiv (2605.21226).

Key facts

  • OCTOPUS optimizes KV cache compression for transformers.
  • It uses octahedral parameterization to map triplet directions to a square.
  • Lloyd-Max quantization is applied to two coordinates and the triplet norm.
  • Bit allocation is non-uniform and depends only on key dimensionality.
  • The codec is data-oblivious, online, and deterministic.
  • It builds on TurboQuant and PolarQuant methods.
  • The paper is on arXiv with ID 2605.21226.
  • It targets long-context autoregressive inference.

Entities

Institutions

  • arXiv

Sources