Universal Reasoner: A Plug-and-Play Reasoning Module for Frozen LLMs

ai-technology · 2026-05-22

Researchers propose Universal Reasoner (UniR), a modular reasoning module that can be added to frozen large language models (LLMs) without retraining. UniR decomposes reward signals into token-level guidance, enabling specialized reasoning while preserving generalization. It uses verifiable rewards and a decoupled training approach, then combines with frozen LLMs at inference by adding output logits. This addresses the high cost and architectural dependencies of traditional fine-tuning methods.

Key facts

UniR is a modular, composable, plug-and-play reasoning module.
It works with frozen LLMs without retraining.
Reward is decomposed into token-level guidance.
Training is decoupled using verifiable rewards.
At inference, UniR adds its output logits to the frozen LLM.
Parameter-Efficient Fine-Tuning (PEFT) methods require retraining per backbone.
UniR aims to enhance reasoning without compromising generalization.
The approach reduces computational resource demands.

Entities

—

Sources

arXiv cs.AI — 2026-05-21