PIPO: Unifying Latent Compression and Multi-Token Prediction for Efficient LLM Decoding

ai-technology · 2026-05-27

Researchers have introduced a novel technique called Pair-In, Pair-Out (PIPO), which integrates latent compression with multi-token prediction to lower the inference expenses associated with autoregressive decoding in large language models. In this approach, a latent compressor and an MTP head function as complementary processes: the compressor merges two input tokens into a single latent representation, whereas the MTP head expands one hidden state into an additional output token. To avoid the costly verifier pass, PIPO employs a streamlined confidence head that determines the acceptance of draft tokens. This method effectively bridges the gap between input-side and output-side techniques, providing a cohesive solution for enhancing LLM inference efficiency.

Key facts

PIPO unifies latent compression and multi-token prediction.
Compressor folds two input tokens into one latent representation.
MTP head unfolds one hidden state into one additional output token.
Lightweight confidence head replaces expensive verifier pass.
Method targets autoregressive decoding inference cost.
Proposed in arXiv paper 2605.27255.
Addresses independent development of input and output side methods.
On-Po observation mentioned but not detailed.

PIPO: Unifying Latent Compression and Multi-Token Prediction for Efficient LLM Decoding

Key facts

Entities

Institutions

Sources