Universal Reasoner: A Plug-and-Play Reasoning Module for Frozen LLMs
Researchers propose Universal Reasoner (UniR), a modular reasoning module that can be added to frozen large language models (LLMs) without retraining. UniR decomposes reward signals into token-level guidance, enabling specialized reasoning while preserving generalization. It uses verifiable rewards and a decoupled training approach, then combines with frozen LLMs at inference by adding output logits. This addresses the high cost and architectural dependencies of traditional fine-tuning methods.
Key facts
- UniR is a modular, composable, plug-and-play reasoning module.
- It works with frozen LLMs without retraining.
- Reward is decomposed into token-level guidance.
- Training is decoupled using verifiable rewards.
- At inference, UniR adds its output logits to the frozen LLM.
- Parameter-Efficient Fine-Tuning (PEFT) methods require retraining per backbone.
- UniR aims to enhance reasoning without compromising generalization.
- The approach reduces computational resource demands.
Entities
—