Multi-Objective Prompt Optimization via Bandit Algorithms
A recent paper on arXiv (2605.14553) presents a bandit-oriented approach for selecting prompts with multiple objectives in large language models. This study tackles the shortcomings of evaluations based on a single metric by exploring the recovery of Pareto prompt sets and the identification of optimal feasible prompts. The researchers modify efficient algorithms from multi-objective bandits and introduce an innovative method for identifying the best feasible arm in structured bandits, providing theoretical assurances for linear scenarios. Experimental results across various LLMs reveal notable enhancements compared to baseline models.
Key facts
- arXiv paper 2605.14553
- Studies multi-objective prompt selection
- Two settings: Pareto prompt set recovery and best feasible prompt identification
- Uses pure-exploration bandits framework
- Novel design for best feasible arm identification in structured bandits
- Theoretical guarantees for linear case
- Experiments across multiple LLMs
- Bandit-based approaches yield significant improvements
Entities
Institutions
- arXiv