Neural MI Estimation Improves Masked Diffusion Model Decoding
A novel neural architecture directly assesses pairwise conditional mutual information (MI) from the hidden states of pretrained masked diffusion models (MDMs). This estimator, which is trained on the model's own conditional distributions using actual MI, effectively captures internal dependency beliefs and forecasts the entire MI matrix in one forward pass. This capability facilitates MI-guided parallel decoding by pinpointing subsets of conditionally independent variables. When tested on Sudoku and protein sequence generation using ESM-C, the MI maps successfully retrieve established structural constraints and decrease inference-time forward passes by a factor of 3 to 5.
Key facts
- Proposes neural estimator for pairwise conditional MI from MDM hidden states
- Uses ground-truth MI from model's conditional distributions for supervision
- Predicts full MI matrix in single forward pass
- Enables MI-guided parallel decoding via conditionally independent subsets
- Tested on Sudoku and protein generation with ESM-C
- MI maps recover known structural constraints
- Reduces inference-time forward passes by 3-5x magnitude
Entities
—