Canopy Entropy: A New Measure for Fine-Tuning's Effect on LLM Information Conveyance

publication · 2026-06-01

A new study proposes Canopy Entropy (CE*), a metric that measures how fine-tuning affects information conveyance in large language models. Unlike previous analyses that overlook output length, CE* views generation as a tree of possible rollouts, quantifying the effective size of the generation space. It jointly captures uncertainty in output length N and sequence Y_{1:N}, equaling total Shannon entropy H(N, Y_{1:N}|X). The metric yields interpretable components like length-entropy correlation ρ(N, r_N). The research, published on arXiv (2605.30844), addresses a gap in understanding how fine-tuning distributes uncertainty across entire generations.

Key facts

Fine-tuning is believed to reduce uncertainty and diversity in LLMs.
Existing analyses overlook output length as a confounder.
Canopy Entropy (CE*) views generation from a tree perspective.
CE* quantifies the effective size of the generation space.
CE* equals total Shannon entropy H(N, Y_{1:N}|X).
CE* yields interpretable metrics like length-entropy correlation.
The paper is on arXiv with ID 2605.30844.
The research focuses on information conveyance in language models.

Canopy Entropy: A New Measure for Fine-Tuning's Effect on LLM Information Conveyance

Key facts

Entities

Institutions

Sources