VQ-SAD: Neuro-Symbolic Diffusion Model for Molecule Generation
A new method called VQ-SAD (Vector Quantized Structure Aware Diffusion) addresses limitations in diffusion-based molecule generation by incorporating symbolic information. Traditional approaches using one-hot representations or Morgan fingerprints suffer from hash collisions and information loss. VQ-SAD employs a VQ-VAE to treat atom and bond codes as latent variables, using frozen pretrained codebooks as tokenizers for the diffusion process. This neuro-symbolic model combines symbolic and neural structural information with a learnable forward process. The large discrete code space provides balanced atom and bond types, enhancing denoising. The paper is available on arXiv (2605.00354).
Key facts
- VQ-SAD uses VQ-VAE for atom and bond codes as latent variables
- Frozen pretrained VQ-VAE codebooks serve as tokenizers
- Neuro-symbolic model combining symbolic and neural information
- Learnable forward process in diffusion model
- Large discrete code space improves denoising
- Addresses hash collisions and information loss in Morgan fingerprints
- Paper published on arXiv with ID 2605.00354
- Cross announcement type
Entities
Institutions
- arXiv