DiHAL: Geometry-Guided Diffusion in Language Models
A new method, DiHAL, proposes inserting diffusion processes into specific layers of pretrained language models based on geometric analysis of hidden states. The approach selects a diffusion-friendly interface within the transformer, replacing lower layers with a diffusion bridge while keeping upper layers and the LM head intact. Experiments on 8B-scale models show improved performance over continuous diffusion baselines.
Key facts
- DiHAL is a geometry-guided diffusion-transformer hybrid.
- It selects a hidden-state interface using geometry-based proxies.
- Lower transformer prefix is replaced with a diffusion bridge.
- Upper layers and original LM head are retained.
- Experiments conducted on 8B-scale backbones.
- Geometry score predicts effective shallow insertion layers.
- Hidden-state recovery improves over continuous diffusion baselines.
- Method avoids direct continuous-to-discrete token recovery.
Entities
—