UG-TTT: Epistemic Uncertainty for Test-Time Discovery in LLMs
The UG-TTT approach introduces a solution to the stagnation of rewards in automated scientific discovery utilizing large language models. Traditional reinforcement learning tends to penalize mutations with high variance, leading to a preference for known patterns and resulting in a plateau of maximum rewards. In contrast, UG-TTT employs a compact ensemble of low-rank adapters alongside a static base model. It assesses per-token disagreement by measuring mutual information between the predictions of the ensemble and weight hypotheses, effectively isolating epistemic uncertainty to differentiate between areas that have not been explored and those that are inherently challenging.
Key facts
- UG-TTT addresses epistemic uncertainty for test-time discovery.
- Standard RL penalizes high-variance mutations, causing reward plateau.
- UG-TTT uses a small ensemble of low-rank adapters over a frozen base model.
- Per-token disagreement is quantified as mutual information between ensemble predictions and weight hypotheses.
- The method isolates epistemic uncertainty to identify unexplored regions.
- The approach distinguishes unexplored regions from intrinsically difficult problems.
- The paper is on arXiv with ID 2605.11328.
- The announcement type is cross.
Entities
Institutions
- arXiv