Nightly Weight Consolidation Outperforms Cascading Compaction in LLM Memory Retention
A new study on arXiv (2605.24657) looks at two methods for keeping user-specific knowledge in large language models (LLMs) during inference-only use. The traditional method, called cascading compaction, only retains 36.8% of knowledge after three cycles. On the other hand, a new technique—nightly consolidation using reflection, synthesis, and Low-Rank Adaptation (LoRA) fine-tuning on one consumer GPU—boasts a much higher retention at 80.4%, which is a notable 43.6 percentage point gain. This research analyzed ten realistic software development conversations with 1,146 test questions across three memory types, finding the biggest improvements in procedural corrections, which saw a 36.3% rise. The results highlight the potential of weight-based updates for managing context-window limits and improving long-term user adaptation.
Key facts
- arXiv paper 2605.24657 compares inference-only vs. weight-based consolidation for LLMs.
- Cascading compaction retains 36.8 ± 3.0% of knowledge after three cycles.
- Nightly consolidation retains 80.4 ± 1.3% of knowledge.
- The gain is 43.6 percentage points (paired t(9)=14.8, p<0.001).
- Experiment used ten software development conversations (n=10, 1,146 test questions).
- Consolidation uses LoRA fine-tuning on a single consumer GPU.
- Largest gains were on procedural corrections (36.3%).
- No-context floor was 11.8%, full-context ceiling was 90.1%.
Entities
Institutions
- arXiv