OLLM: Options-based Large Language Models Introduce Discrete Latent Variables for Next-Token Prediction
A novel technique known as Options LLM (OLLM) substitutes the conventional single next-token prediction in large language models with a collection of learned options linked to a discrete latent variable. This method effectively captures variation using a compact latent space that characterizes several feasible next-token choices, which can be selected or explored by a downstream policy. Architecturally, OLLM operates as a lightweight "plug-in," integrating an encoder and a decoder prior to the output head, enabling the transformation of nearly any pretrained LLM with minimal additional parameters. Applied to a 1.7B-parameter backbone trained on OpenMathReasoning and assessed on OmniMath, only 1.56% of parameters were trainable. While state-of-the-art LoRA-adapted baselines achieve a maximum of 51% final answer accuracy, OLLM's option set can reach around 70% with optimal late selection. Unlike traditional methods that depend on temperature or sampling heuristics for diversity, OLLM explicitly models variation through its discrete latent variable framework. The paper detailing OLLM is accessible on arXiv under the identifier 2604.19087v1.
Key facts
- OLLM replaces single next-token prediction with learned options indexed by a discrete latent variable
- The method models variation explicitly through a small latent space parametrizing multiple plausible next-token options
- OLLM is architecturally a lightweight plug-in inserting encoder and decoder layers before the output head
- The approach allows almost any pretrained LLM to be converted with minimal additional parameters
- Applied to a 1.7B-parameter backbone with only 1.56% trainable parameters
- Trained on OpenMathReasoning and evaluated on OmniMath
- LoRA-adapted baselines peak at 51% final answer correctness
- OLLM enables up to approximately 70% correctness under optimal late selection
Entities
Institutions
- arXiv