Dynamic Latent Routing: A New Post-Training Method for Language Models
A new method known as Dynamic Latent Routing (DLR) has been developed by researchers, designed for language models to simultaneously learn discrete latent codes, routing policies, and model parameters in a single training phase. This approach draws inspiration from General Dijkstra Search (GDS), which demonstrates that optimal goal-reaching strategies can be derived through the temporal composition of intermediate optimal sub-policies in Markov Decision Processes with varying reward functions over time. In scenarios with limited data for fine-tuning, DLR either matches or surpasses supervised fine-tuning across six models and four datasets, achieving an average improvement of +6.6 percentage points, while previous discrete-latent methods consistently fall short of SFT. Detailed mechanistic analyses reveal that DLR develops structured routing behaviors with unique causal mechanisms. The research is accessible on arXiv with the identifier 2605.14323.
Key facts
- DLR jointly learns discrete latent codes, routing policies, and model parameters.
- Method is based on General Dijkstra Search (GDS).
- GDS proves globally optimal goal-reaching policies via temporal composition.
- DLR matches or outperforms SFT in low-data settings.
- Mean gain of +6.6 percentage points over SFT.
- Tested on four datasets and six models.
- Prior discrete-latent baselines underperform SFT.
- DLR learns structured routing behaviors with distinct causal mechanisms.
Entities
Institutions
- arXiv