Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback

other · 2026-05-09

A recent study published on arXiv (2605.05745) explores fixed-confidence best arm identification within generalized linear bandits through a hybrid feedback framework. In each round, the learner can opt for either absolute reward feedback from one arm or relative (dueling) feedback from a pair of arms, both influenced by generalized linear models. The researchers present a likelihood-ratio-based confidence sequence that integrates diverse generalized linear observations, resulting in a clear ellipsoidal confidence set based on a self-concordance premise. They also introduce a hybrid Track-and-Stop algorithm that dynamically allocates queries by monitoring a minimax-optimal design across a combined action space of arms and pairs. The algorithm is shown to be δ-correct, with high-probability upper bounds on stopping time. Additionally, the framework is adapted to consider varying acquisition costs across feedback types, with empirical tests supporting the methodology.

Key facts

Paper on arXiv: 2605.05745
Studies fixed-confidence best arm identification in generalized linear bandits
Hybrid feedback model: absolute reward from single arm or dueling feedback from arm pair
Likelihood-ratio-based confidence sequence unifies heterogeneous observations
Ellipsoidal confidence set under self-concordance assumption
Hybrid Track-and-Stop algorithm adaptively allocates queries
Algorithm is δ-correct with high-probability stopping time bounds
Extended to cost-aware setting with heterogeneous acquisition costs

Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback

Key facts

Entities

Institutions

Sources