RSAT: Structured Attribution Improves Table Reasoning in Small Language Models
Researchers have introduced RSAT, a method that trains small language models (SLMs) of 1-8 billion parameters to produce step-by-step reasoning with cell-level citations when answering table questions. The approach consists of two phases: Phase 1 uses supervised fine-tuning (SFT) to teach a structured JSON output format from verified reasoning traces, while Phase 2 applies group relative policy optimization (GRPO) with a composite reward centered on NLI-based faithfulness, citation validity, and parsimony. Tested across six models from two families—Qwen 2.5 (1.5B, 3B, 7B) and Llama 3 (1B, 3B, 8B)—RSAT improved faithfulness by 3.7 times over SFT alone (from 0.224 to 0.826), with near-perfect citation validity of 0.992. Post-hoc attribution methods collapsed below 13% format success, demonstrating that attribution must be integrated into reasoning rather than retrofitted. Ablation studies showed the faithfulness reward is essential: removing it dropped faithfulness from 0.97 to 0.03.
Key facts
- RSAT trains small language models (1-8B parameters) for table reasoning with cell-level citations.
- Phase 1: SFT teaches structured JSON output from verified reasoning traces.
- Phase 2: GRPO optimizes a composite reward including NLI-based faithfulness, citation validity, and parsimony.
- Tested on Qwen 2.5 (1.5B, 3B, 7B) and Llama 3 (1B, 3B, 8B) models.
- Faithfulness improved 3.7× over SFT alone (0.224 to 0.826).
- Citation validity reached near-perfect 0.992.
- Post-hoc attribution methods had less than 13% format success.
- Removing faithfulness reward dropped faithfulness from 0.97 to 0.03.
Entities
Institutions
- Qwen
- Llama