Multi-Agent Framework for Concept Awakening in Diffusion Models
A recent paper on arXiv (2605.18150) presents a multi-agent framework guided by surrogates for concept awakening within diffusion models constrained by black-box conditions. The researchers approach the denoising process from a trajectory viewpoint, revealing that the erasure of concepts mainly affects early text-semantic alignment, yet does not completely halt the flow of semantic information during denoising. As the generation advances, the model increasingly relies on the changing noisy state instead of textual inputs. This innovative method explores the relatively neglected area of concept awakening in black-box environments, differing from traditional white-box methods that depend on optimization or inversion. The study underscores weaknesses in current concept erasure strategies, which often suppress target concepts rather than eliminate them, making models vulnerable to awakening attacks.
Key facts
- arXiv paper 2605.18150 proposes a surrogate-guided multi-agent framework for concept awakening.
- The framework operates under black-box constraints.
- Concept erasure disrupts early-stage text-semantic alignment but not full propagation.
- Denoising dynamics increasingly rely on noisy state over textual conditions.
- Existing concept erasure methods suppress rather than eliminate target concepts.
- The work addresses a gap in black-box concept awakening research.
- The paper was announced as a new submission on arXiv.
- The approach contrasts with white-box optimization or inversion methods.
Entities
Institutions
- arXiv