Prologue Method Bridges Reconstruction-Generation Gap in AR Image Generation

ai-technology · 2026-05-09

Researchers have introduced Prologue, an innovative method for autoregressive (AR) image generation that separates the processes of reconstruction and generation by adding a brief sequence of prologue tokens to the visual token array. These prologue tokens are trained solely using AR cross-entropy loss, while the visual tokens focus on reconstruction. In tests on ImageNet 256x256, Prologue-Base lowers gFID from 21.01 to 10.75 without the need for classifier-free guidance, maintaining nearly the same level of reconstruction. Prologue-Large achieves a notable rFID of 0.99 and gFID of 1.46, utilizing a standard AR model without any additional semantic supervision. The approach is defined from an ELBO standpoint.

Key facts

Prologue is proposed to bridge the reconstruction-generation gap in autoregressive image generation.
Prologue generates a small set of prologue tokens prepended to the visual token sequence.
Prologue tokens are trained exclusively with AR cross-entropy loss.
Visual tokens remain dedicated to reconstruction.
On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance.
Prologue-Large achieves rFID of 0.99 and gFID of 1.46 using a standard AR model.
The approach is formalized from an ELBO perspective.
No auxiliary semantic supervision is used for Prologue-Large.

Entities

—

Sources

arXiv cs.AI — 2026-05-09