Minimax Rates and Spectral Distillation for Tree Ensembles
A recent theoretical investigation into tree ensembles, such as random forests and gradient boosting machines, indicates that their ability to predict is influenced by the eigenvalue decay of a kernel operator. This research establishes minimax-optimal convergence rates for random forest regression, assuming mild regularity conditions on tree growth. Furthermore, it presents compression techniques that simplify tree ensembles into significantly smaller models without compromising accuracy. In random forests, the dominant predictive directions are represented by the leading eigenfunctions of the kernel operator, while in gradient boosting machines, the leading singular vectors of the smoother matrix serve a similar purpose. This study offers a spectral viewpoint that enhances the understanding of these popular algorithms.
Key facts
- Tree ensembles include random forests and gradient boosting machines.
- Minimax-optimal convergence rates are derived for random forest regression.
- Eigenvalue decay of the induced kernel operator governs statistical rates.
- Compression schemes exploit spectral representations to reduce model size.
- Leading eigenfunctions capture dominant predictive directions for random forests.
- Leading singular vectors of the smoother matrix are used for gradient boosting machines.
- Distilled models are orders of magnitude smaller than original ensembles.
- The study is published on arXiv with ID 2605.11841.
Entities
—