TaNOS: Self-Supervised Framework Boosts Numerical Reasoning in Tables
A team of researchers has unveiled TaNOS, a continual pre-training framework aimed at enhancing numerical reasoning capabilities with expert-domain tables. This framework tackles the prevalent problem of models depending on header-operation shortcuts, which hampers their adaptability to domain shifts. TaNOS consists of three key elements: header anonymization to minimize lexical memorization, operation sketches that offer limited structural hints, and self-supervised pretraining that generates correctness-guaranteed program-question pairs from existing tables in a program-first approach. By separating domain semantics from numerical operation structure, TaNOS improves transferability. When tested on an 8B instruction-tuned model, it achieved an execution accuracy of 80.13% on FinQA using just 10% of the training data, surpassing the supervised fine-tuning baseline of 73.97% with complete training data and proprietary models. The research paper is accessible on arXiv under ID 2604.21495.
Key facts
- TaNOS is a continual pre-training framework for numerical reasoning over expert-domain tables.
- It addresses domain shift and reliance on header-operation shortcuts.
- Components: header anonymization, operation sketches, self-supervised pretraining.
- Achieved 80.13% execution accuracy on FinQA with 10% training data.
- Outperforms SFT baseline (73.97%) with full training data.
- Applied to an 8B instruction-tuned model.
- Paper ID: arXiv:2604.21495.
Entities
Institutions
- arXiv