HARBOR: Automated Harness Optimization for Language Model Agents

ai-technology · 2026-04-25

A new study on arXiv (2604.20938) suggests that the main difficulties faced by long-horizon language-model agents come not from the models themselves, but from the supporting framework—what they call the 'harness.' This harness includes elements like context compaction, tool caching, semantic memory, and more, all of which connect the model to a controlled execution environment. The researchers argue that creating this harness is a crucial challenge in machine learning. They found that automated configuration searches lead to more effective outcomes compared to manual setups, especially when dealing with larger flag spaces. Their reference solution, HARBOR (Harness Axis-aligned Regularized Bayesian Optimization Routine), uses a specific surrogate method and evaluates across multiple fidelity levels.

Key facts

Paper arXiv:2604.20938
Title: HARBOR: Automated Harness Optimization
Focus on long-horizon language-model agents
Harness includes context compaction, tool caching, semantic memory, trajectory reuse, speculative tool prediction
Harness design framed as first-class ML problem
Automated configuration search beats manual stacking for large flag spaces
Formalization as constrained noisy Bayesian optimization
Configuration space is mixed-variable and cost-heterogeneous
Rewards are cold-start-corrected
Safety check via posterior chance constraints
Reference solver named HARBOR
Uses block-additive SAAS surrogate and multi-fidelity evaluation

HARBOR: Automated Harness Optimization for Language Model Agents

Key facts

Entities

Institutions

Sources