Semantic Loss Fine-Tuning Prevents Model Collapse in Causal Reasoning
A recent study published on arXiv (2605.05438) indicates that conventional fine-tuning of transformer models, such as Gemma 270M, in causal reasoning tasks results in severe model failure, with models resorting to simplistic responses like consistently answering "Yes" or "No." When fine-tuned on transitivity and d-separation tasks without incorporating semantic loss, the collapse rate reached 100%, yielding deceptively high accuracy of 73.9% without any genuine causal reasoning. The authors suggest a semantic loss function that integrates graph-based logical constraints and dynamic lambda scheduling, which mitigates collapse and results in 70.4% accuracy for transitivity tasks and 68.6% for d-separation tasks, marking a 42.7% enhancement over collapsed models. Adversarial tests on 1,000 structural reasoning samples revealed that semantic models achieved 67-70% accuracy, whereas collapsed models performed poorly, with accuracy ranging from 43-71%.
Key facts
- Standard fine-tuning on causal reasoning leads to catastrophic model collapse.
- Gemma 270M fine-tuned on transitivity and d-separation tasks without semantic loss had 100% collapse rate.
- Collapsed models achieved 73.9% accuracy but learned no causal reasoning.
- Proposed semantic loss function uses graph-based logical constraints and dynamic lambda scheduling.
- Semantic model achieved 70.4% accuracy on transitivity and 68.6% on d-separation.
- Improvement of 42.7% over collapsed baselines.
- Adversarial evaluation on 1,000 samples: semantic models 67-70%, collapsed models 43-71%.
- Study published on arXiv with ID 2605.05438.
Entities
Institutions
- arXiv