MENTAT: A New Method for Reasoning-Intensive Regression with LLMs
AI researchers have identified a challenging task called reasoning-intensive regression (RiR), where large language models (LLMs) must deduce subtle numerical scores from text. Unlike standard regression tasks like sentiment analysis, RiR requires deep contextual analysis and appears in applications such as rubric-based scoring, dense reward modeling, and domain-specific retrieval. The researchers established a benchmark with four realistic problems and found that both prompting frozen LLMs and fine-tuning Transformer encoders often struggle with RiR. They propose MENTAT, a lightweight method combining batch-reflective prompt optimization with neural ensemble learning, which achieves superior performance.
Key facts
- Reasoning-intensive regression (RiR) involves deducing subtle numerical scores from text.
- RiR appears in rubric-based scoring, dense reward modeling, and domain-specific retrieval.
- A benchmark of four realistic RiR tasks was established.
- Prompting frozen LLMs and fine-tuning Transformer encoders often struggle with RiR.
- MENTAT combines batch-reflective prompt optimization with neural ensemble learning.
- MENTAT is a simple and lightweight method.
- The research is published on arXiv with ID 2508.21762.
- The paper is a replace-cross announcement.
Entities
Institutions
- arXiv