Context Attribution via Multi-Armed Bandit Optimization
A new framework formulates context attribution in retrieval-augmented generation as a combinatorial multi-armed bandit problem. Linear Thompson Sampling identifies influential context segments while minimizing model queries. The reward function uses token log-probabilities to measure segment support for the response, applicable to open-source and black-box API models. Unlike SHAP and other perturbation-based methods, this approach adaptively prioritizes informative subsets based on posterior estimates, reducing computational costs. Experiments on multiple QA benchmarks show the method achieves up to significant improvements in attribution accuracy and efficiency.
Key facts
- arXiv:2506.19977v2
- Announce Type: replace
- Context attribution formulated as combinatorial multi-armed bandit problem
- Uses Linear Thompson Sampling
- Reward function leverages token log-probabilities
- Applicable to open-source and black-box API-based models
- Adaptively prioritizes informative subsets based on posterior estimates
- Experiments on multiple QA benchmarks
Entities
Institutions
- arXiv