Anytime-Valid Inference for Online Decision Trees
A recent paper on arXiv (2605.31239) presents a technique aimed at improving split selection in online decision trees, specifically Hoeffding Trees utilized in Adaptive Random Forests. Existing methods depend on fixed-sample concentration bounds; however, data-driven stopping criteria can undermine statistical assurances, possibly increasing the likelihood of false splits to one. The new approach employs anytime-valid inference to manage false splits across various data streams, including those that are non-stationary, while guaranteeing a limited commitment time when there is a predictive advantage.
Key facts
- Bagging-based ensembles like Adaptive Random Forests use Hoeffding Trees as base learners.
- Hoeffding Trees grow incrementally by testing candidate splits using concentration inequalities.
- Existing variants lack valid statistical guarantees due to data-dependent stopping rules.
- Current analyses rely on fixed-sample concentration bounds, which are invalidated by adaptive stopping.
- The new method provides anytime-valid control of false splits under arbitrary data streams.
- The method works in non-stationary settings.
- It ensures finite commitment time under a predictive advantage.
- The paper is available on arXiv with ID 2605.31239.
Entities
Institutions
- arXiv