SEAT Method Preserves Epistemic Abstention in LLM Knowledge Adaptation
A novel fine-tuning technique known as SEAT tackles a significant issue in the integration of new knowledge into large language models. Traditional fine-tuning often diminishes the model's capacity for epistemic abstention—the recognition of its own knowledge limitations—an aspect that is particularly vital in high-stakes environments where such abstention acts as a safeguard against inaccuracies. SEAT employs sparse tuning to limit global activation drift alongside entity-perturbed KL regularization, enhancing local epistemic boundaries and curbing knowledge spillover. Notably, this method does not necessitate alignment data, boundary probing, or post-hoc adjustments, making it suitable for lightweight and privacy-conscious applications. In tests across multiple models and datasets, SEAT demonstrated an 18% to 101% improvement in human-evaluated abstention on unfamiliar queries compared to the best baseline. The findings were published on arXiv under the identifier arXiv:2506.14387v3, categorized as a replacement announcement. This strategy successfully balances robust knowledge acquisition with the essential capability to abstain when uncertain.
Key facts
- SEAT is a preventive fine-tuning method for LLMs
- It preserves epistemic abstention while maintaining knowledge acquisition
- Standard fine-tuning often erodes aligned epistemic abstention
- Epistemic abstention is critical in high-stakes settings as a safeguard against hallucination
- SEAT combines sparse tuning with entity-perturbed KL regularization
- The method requires no alignment data, explicit boundary probing, or post-hoc re-alignment
- SEAT improved human-evaluated abstention on unknown queries by 18%-101% over baselines
- Research was announced on arXiv under identifier arXiv:2506.14387v3
Entities
Institutions
- arXiv