AtManRL Method Uses Differentiable Attention to Improve LLM Reasoning Faithfulness
A novel research technique known as AtManRL tackles the issue of ensuring that chain-of-thought reasoning in large language models genuinely affects their final outputs instead of just being present alongside them. This method, developed through reinforcement learning, trains an additive attention mask to pinpoint essential tokens within reasoning paths, generating a saliency reward signal that motivates models to produce reasoning that influences predictions. The Llama-3.2-3B-Instruct model was tested using the GSM8K and MMLU benchmarks. AtManRL combines saliency and outcome-based rewards within the GRPO framework to optimize both accuracy and interpretability. By utilizing differentiable attention manipulation, this approach seeks to enhance the fidelity of reasoning processes to the mechanisms that generate answers. The research was published on arXiv under the identifier 2604.16158v1.
Key facts
- AtManRL is a method for improving reasoning faithfulness in large language models
- It uses differentiable attention manipulation through reinforcement learning
- The approach trains an additive attention mask to identify crucial reasoning tokens
- Creates a saliency reward signal to encourage genuinely influential reasoning
- Integrates with outcome-based rewards in the GRPO framework
- Experiments conducted on GSM8K and MMLU benchmarks
- Tested with Llama-3.2-3B-Instruct model
- Research announced on arXiv with identifier 2604.16158v1
Entities
Institutions
- arXiv