LLM Harness Design Boosts Algorithm Discovery in Vesper Framework

ai-technology · 2026-05-18

A new arXiv preprint investigates how execution infrastructure design, or 'harness,' impacts automated algorithm discovery using large language models (LLMs) and evolutionary search. Building on AlphaEvolve and FunSearch, the study poses three key questions: whether to generate many algorithms with brief reasoning or fewer with deeper reasoning under a fixed token budget; how to handle evaluation hacks where programs exploit scoring functions; and how to safely parallelize agents requiring full filesystem access. The authors present Vesper, a framework incorporating harness improvements that address these issues. Evaluated on Circle Packing under identical token budgets, Vesper shows that generating fewer algorithms with deeper thought yields better results. The work emphasizes that discovery success depends not only on model capability but significantly on harness design.

Key facts

AlphaEvolve and FunSearch combine LLMs with evolutionary search for algorithm discovery.
Harness design significantly influences discovery success beyond model capability.
Three questions addressed: token allocation, evaluation hacks, and safe parallel execution.
Vesper is a new framework with harness improvements.
Vesper evaluated on Circle Packing under same token budget.
Generating fewer algorithms with deeper thought outperforms many shallow attempts.
Paper available on arXiv with ID 2605.15221.
Study focuses on automated algorithm discovery using coding agents.

LLM Harness Design Boosts Algorithm Discovery in Vesper Framework

Key facts

Entities

Institutions

Sources