SELFCI: A Self-Distillation Framework for Privacy in LLMs
A new framework called SELFCI (Self-Distillation for Contextual Integrity) aims to improve privacy in large language models by decoupling information suppression from task resolution. Proposed in a paper on arXiv (2605.20258), SELFCI uses complementary self-distillation to optimize two independent reverse KL divergences: one preserves task-relevant information for utility, the other enforces minimal disclosure. This creates a Product-of-Experts target that balances privacy and performance without degrading task accuracy. The approach addresses Contextual Integrity (CI), which governs information flows according to contextual norms, a critical issue as LLMs are deployed as personal agents handling sensitive workflows.
Key facts
- SELFCI stands for Self-Distillation for Contextual Integrity
- It decouples information suppression from task resolution
- Uses two independent reverse KL divergences
- One divergence preserves task-relevant information
- The other enforces minimal and appropriate disclosure
- Creates a Product-of-Experts (PoE) target
- Aims to overcome privacy-utility trade-off
- Paper published on arXiv with ID 2605.20258
Entities
Institutions
- arXiv