RaPD: Resolution-Agnostic Pixel Diffusion via Semantics-Enriched Implicit Representations
RaPD (Resolution-agnostic Pixel Diffusion) introduces an innovative generative model that executes diffusion within a continuous Neural Image Field (NIF) latent space, allowing for image synthesis that is not dependent on resolution. In contrast to earlier techniques that apply continuity solely during the decoding phase, RaPD incorporates it throughout the entire generative framework. It employs Semantic Representation Guidance for latent learning that is aware of generation and utilizes a Coordinate-Queried Attention Renderer for rendering that adapts to different scales. By adjusting query coordinates, a single denoised latent can be rendered at any resolution while maintaining a constant diffusion cost. Experiments indicate enhanced generation quality and scalability in resolution. The paper can be found on arXiv in the Computer Vision and Pattern Recognition category.
Key facts
- RaPD performs diffusion in a continuous Neural Image Field (NIF) latent space.
- It uses Semantic Representation Guidance for generation-aware latent learning.
- It uses a Coordinate-Queried Attention Renderer for coordinate-conditioned, scale-aware rendering.
- A single denoised latent can be rendered at arbitrary resolutions by changing query coordinates.
- Diffusion cost remains fixed regardless of output resolution.
- Experiments show superior generation quality and resolution scalability.
- The paper is categorized under Computer Science > Computer Vision and Pattern Recognition.
- The paper is available on arXiv with ID 2605.15908.
Entities
Institutions
- arXiv