Hypernetwork-Based LLM Adaptation Fails on Knowledge Conflicts
A recent study indicates that hypernetwork-based techniques, such as Doc-to-LoRA, which embed documents into the weights of a large language model (LLM) in a single forward pass, consistently fail when the document contradicts existing pretraining knowledge. The accuracy plummets to 46.4% for complex facts. This failure is attributed to a magnitude issue: the adapter margin of the hypernetwork remains unchanged while the pretrained margin increases with training frequency, leading to inherent conflicts. In tests involving 194 conflicts, baseline accuracy dropped from 68% for weak-prior questions to just 16% for strong-prior ones, resulting in a 52 percentage-point disparity. Proposed remedies include Selective Layer Boosting and Conflict-Aware Internalization.
Key facts
- Hypernetwork-based methods like Doc-to-LoRA fail systematically on knowledge conflicts.
- Accuracy drops to 46.4% on the deepest facts when document contradicts pretraining.
- Failure is a magnitude problem, not representational.
- Adapter margin is constant while pretrained margin grows with training frequency.
- Baseline accuracy falls from 68% to 16% on strong-prior questions.
- 52 percentage-point gap between weak and strong prior questions.
- Selective Layer Boosting and Conflict-Aware Internalization are proposed as cures.
- Study published on arXiv with ID 2604.23750.
Entities
Institutions
- arXiv