ARTFEED — Contemporary Art Intelligence

ValueAlpha: Stress-Testing LLM Judges for Investment Rationales

ai-technology · 2026-04-30

A new paper introduces ValueAlpha, a preregistered agreement-gated stress-test protocol for evaluating LLM-judged investment rationales before returns are observable. The protocol addresses the pre-realization evaluation problem in long-horizon investment decisions, where realized returns arrive too late and are too noisy. ValueAlpha uses an agreement gate to decide when LLM-judged claims are publishable, qualified, or invalid. In a controlled prototype with 1,000 honest decision cycles and 100 adversarial controls (1,100 trajectories, 5,500 judge calls), the aggregate agreement gate cleared at κ̄_w = 0.7168, but several overclaims were prevented. Lower-rank systems collapsed. The paper is available on arXiv.

Key facts

  • ValueAlpha is a preregistered agreement-gated stress-test protocol.
  • It evaluates LLM-judged investment rationales before returns are observable.
  • The protocol addresses the pre-realization evaluation problem.
  • In a controlled prototype, 1,000 honest decision cycles and 100 adversarial controls were used.
  • The aggregate agreement gate cleared at κ̄_w = 0.7168.
  • Several overclaims were prevented.
  • Lower-rank systems collapsed.
  • The paper is available on arXiv.

Entities

Institutions

  • arXiv

Sources