FinRAG-12B: Banking LLM Achieves 12% Refusal Rate with 143M Tokens
Researchers introduced FinRAG-12B, a 12-billion parameter language model for grounded question answering in banking. The model uses a data-efficient pipeline with only 143 million tokens, combining LLM-as-a-Judge filtering, citation annotation, and curriculum learning. It outperforms GPT-4.1 on citation grounding while maintaining high answer quality. A calibrated refusal mechanism trained on 22% unanswerable examples yields a 12% 'I don't know' rate, improving over the base model's unsafe 4.3% rate. The work addresses banking industry demands for accuracy, regulatory compliance, and verifiable responses.
Key facts
- FinRAG-12B is a 12B parameter LLM for banking question answering.
- Training uses only 143M tokens with LLM-as-a-Judge filtering.
- Outperforms GPT-4.1 on citation grounding.
- Calibrated refusal mechanism yields 12% 'I don't know' rate.
- Base model had unsafe 4.3% refusal rate.
- Trained on 22% unanswerable examples.
- Addresses banking industry demands for accuracy and compliance.
- Uses curriculum learning and citation annotation.
Entities
Institutions
- arXiv