WiCER: Iterative Knowledge Compilation for LLM Wiki Systems
A recent study published on arXiv (2605.07068) presents WiCER (Wiki-memory Compile, Evaluate, Refine), an iterative algorithm aimed at bridging the compilation gap in LLM Wiki systems. This LLM Wiki framework compiles domain-specific knowledge into a lasting artifact accessed through KV cache inference, achieving sub-second latency with no retrieval failures. However, simplistic compilation methods often lead to the significant loss of essential facts. The research examines this issue across 17 RepLiQA domains (6,800 questions), revealing that full context KV cache inference surpasses RAG in curated knowledge (4.38 vs. 4.08 out of 5, 7.3x quicker TTFT) but falters at scale due to attention dilution. In contrast, blind compilation performs poorly (2.14 to 2.32 vs. 3.46, with a 53-60% catastrophic failure rate). WiCER, drawing from counterexample-guided abstraction refinement (CEGAR), iteratively assesses and improves compiled wikis to mitigate this gap. The authors, affiliated with an undisclosed institution, have made the paper accessible on arXiv.
Key facts
- WiCER addresses the compilation gap in LLM Wiki systems.
- LLM Wiki pattern uses KV cache inference for sub-second latency.
- Study uses 17 RepLiQA domains with 6,800 questions.
- Full context KV cache outperforms RAG on curated knowledge (4.38 vs 4.08).
- Blind compilation has 53-60% catastrophic failure rate.
- WiCER is inspired by CEGAR (counterexample-guided abstraction refinement).
- Paper available on arXiv with ID 2605.07068.
- TTFT is 7.3 times faster for full context KV cache vs RAG.
Entities
Institutions
- arXiv