ARTFEED — Contemporary Art Intelligence

CLIP Embeddings Drive Memorization in Stable Diffusion

ai-technology · 2026-05-07

A new arXiv paper reveals that memorization in Stable Diffusion is unexpectedly driven by CLIP embeddings. Researchers categorized input tokens as start-of-text, prompt-related, end-of-text, and padding. They found that padding embeddings, which structurally duplicate end-of-text embeddings, amplify the influence of the latter, causing the model to over-rely on it and driving memorization. Prompt-related embeddings contribute minimally in memorized cases.

Key facts

  • Memorization in Stable Diffusion is driven by CLIP embeddings.
  • Input tokens are categorized as start-of-text, prompt-related, end-of-text, and padding.
  • Padding embeddings duplicate end-of-text embeddings structurally.
  • This duplication amplifies the influence of end-of-text embeddings.
  • Prompt-related embeddings contribute minimally to memorized cases.
  • The paper is from arXiv:2605.02908.
  • The research focuses on text-to-image diffusion models.
  • The findings have implications for interpretability and safety.

Entities

Institutions

  • arXiv

Sources