Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback
A recent study published on arXiv (2605.05745) explores fixed-confidence best arm identification within generalized linear bandits through a hybrid feedback framework. In each round, the learner can opt for either absolute reward feedback from one arm or relative (dueling) feedback from a pair of arms, both influenced by generalized linear models. The researchers present a likelihood-ratio-based confidence sequence that integrates diverse generalized linear observations, resulting in a clear ellipsoidal confidence set based on a self-concordance premise. They also introduce a hybrid Track-and-Stop algorithm that dynamically allocates queries by monitoring a minimax-optimal design across a combined action space of arms and pairs. The algorithm is shown to be δ-correct, with high-probability upper bounds on stopping time. Additionally, the framework is adapted to consider varying acquisition costs across feedback types, with empirical tests supporting the methodology.
Key facts
- Paper on arXiv: 2605.05745
- Studies fixed-confidence best arm identification in generalized linear bandits
- Hybrid feedback model: absolute reward from single arm or dueling feedback from arm pair
- Likelihood-ratio-based confidence sequence unifies heterogeneous observations
- Ellipsoidal confidence set under self-concordance assumption
- Hybrid Track-and-Stop algorithm adaptively allocates queries
- Algorithm is δ-correct with high-probability stopping time bounds
- Extended to cost-aware setting with heterogeneous acquisition costs
Entities
Institutions
- arXiv