ARTFEED — Contemporary Art Intelligence

PAD-Rec: Accelerating LLM-Based Generative Recommendation

other · 2026-05-01

A new method called PAD-Rec (Position-Aware Drafting for generative Recommendation) improves inference speed in large language model (LLM)-based generative list-wise recommendation. The technique addresses limitations of standard speculative decoding (SD), which uses a small draft model to propose multiple tokens and a target LLM to verify them. In recommendation tasks, items are represented by semantic-ID tokens with separators, and token semantics depend on their position within an item slot. Uncertainty also increases with speculation depth. PAD-Rec augments the draft model with position-aware signals to account for these factors, achieving greater speedups without altering the target distribution. The work is published on arXiv under ID 2604.27747.

Key facts

  • PAD-Rec is a position-aware drafting module for generative recommendation.
  • It accelerates inference in LLM-based list-wise recommendation.
  • Standard speculative decoding treats tokens uniformly, ignoring position-dependent semantics.
  • PAD-Rec models token slot position and uncertainty growth with depth.
  • The method does not change the target distribution.
  • It is designed for generative recommendation using semantic-ID tokens.
  • The paper is available on arXiv with ID 2604.27747.
  • The approach aims to reduce latency in sequential decoding.

Entities

Institutions

  • arXiv

Sources