Multi-Objective Prompt Optimization via Bandit Algorithms

ai-technology · 2026-05-16

A recent paper on arXiv (2605.14553) presents a bandit-oriented approach for selecting prompts with multiple objectives in large language models. This study tackles the shortcomings of evaluations based on a single metric by exploring the recovery of Pareto prompt sets and the identification of optimal feasible prompts. The researchers modify efficient algorithms from multi-objective bandits and introduce an innovative method for identifying the best feasible arm in structured bandits, providing theoretical assurances for linear scenarios. Experimental results across various LLMs reveal notable enhancements compared to baseline models.

Key facts

arXiv paper 2605.14553
Studies multi-objective prompt selection
Two settings: Pareto prompt set recovery and best feasible prompt identification
Uses pure-exploration bandits framework
Novel design for best feasible arm identification in structured bandits
Theoretical guarantees for linear case
Experiments across multiple LLMs
Bandit-based approaches yield significant improvements

Multi-Objective Prompt Optimization via Bandit Algorithms

Key facts

Entities

Institutions

Sources