ARTFEED — Contemporary Art Intelligence

VQ-SAD: Neuro-Symbolic Diffusion Model for Molecule Generation

other · 2026-05-04

A new method called VQ-SAD (Vector Quantized Structure Aware Diffusion) addresses limitations in diffusion-based molecule generation by incorporating symbolic information. Traditional approaches using one-hot representations or Morgan fingerprints suffer from hash collisions and information loss. VQ-SAD employs a VQ-VAE to treat atom and bond codes as latent variables, using frozen pretrained codebooks as tokenizers for the diffusion process. This neuro-symbolic model combines symbolic and neural structural information with a learnable forward process. The large discrete code space provides balanced atom and bond types, enhancing denoising. The paper is available on arXiv (2605.00354).

Key facts

  • VQ-SAD uses VQ-VAE for atom and bond codes as latent variables
  • Frozen pretrained VQ-VAE codebooks serve as tokenizers
  • Neuro-symbolic model combining symbolic and neural information
  • Learnable forward process in diffusion model
  • Large discrete code space improves denoising
  • Addresses hash collisions and information loss in Morgan fingerprints
  • Paper published on arXiv with ID 2605.00354
  • Cross announcement type

Entities

Institutions

  • arXiv

Sources