Nightly Weight Consolidation Outperforms Cascading Compaction in LLM Memory Retention

ai-technology · 2026-05-26

A new study on arXiv (2605.24657) looks at two methods for keeping user-specific knowledge in large language models (LLMs) during inference-only use. The traditional method, called cascading compaction, only retains 36.8% of knowledge after three cycles. On the other hand, a new technique—nightly consolidation using reflection, synthesis, and Low-Rank Adaptation (LoRA) fine-tuning on one consumer GPU—boasts a much higher retention at 80.4%, which is a notable 43.6 percentage point gain. This research analyzed ten realistic software development conversations with 1,146 test questions across three memory types, finding the biggest improvements in procedural corrections, which saw a 36.3% rise. The results highlight the potential of weight-based updates for managing context-window limits and improving long-term user adaptation.

Key facts

arXiv paper 2605.24657 compares inference-only vs. weight-based consolidation for LLMs.
Cascading compaction retains 36.8 ± 3.0% of knowledge after three cycles.
Nightly consolidation retains 80.4 ± 1.3% of knowledge.
The gain is 43.6 percentage points (paired t(9)=14.8, p<0.001).
Experiment used ten software development conversations (n=10, 1,146 test questions).
Consolidation uses LoRA fine-tuning on a single consumer GPU.
Largest gains were on procedural corrections (36.3%).
No-context floor was 11.8%, full-context ceiling was 90.1%.

Nightly Weight Consolidation Outperforms Cascading Compaction in LLM Memory Retention

Key facts

Entities

Institutions

Sources