Canopy Entropy: A New Measure for Fine-Tuning's Effect on LLM Information Conveyance
A new study proposes Canopy Entropy (CE*), a metric that measures how fine-tuning affects information conveyance in large language models. Unlike previous analyses that overlook output length, CE* views generation as a tree of possible rollouts, quantifying the effective size of the generation space. It jointly captures uncertainty in output length N and sequence Y_{1:N}, equaling total Shannon entropy H(N, Y_{1:N}|X). The metric yields interpretable components like length-entropy correlation ρ(N, r_N). The research, published on arXiv (2605.30844), addresses a gap in understanding how fine-tuning distributes uncertainty across entire generations.
Key facts
- Fine-tuning is believed to reduce uncertainty and diversity in LLMs.
- Existing analyses overlook output length as a confounder.
- Canopy Entropy (CE*) views generation from a tree perspective.
- CE* quantifies the effective size of the generation space.
- CE* equals total Shannon entropy H(N, Y_{1:N}|X).
- CE* yields interpretable metrics like length-entropy correlation.
- The paper is on arXiv with ID 2605.30844.
- The research focuses on information conveyance in language models.
Entities
Institutions
- arXiv