ValueAlpha: Stress-Testing LLM Judges for Investment Rationales
A new paper introduces ValueAlpha, a preregistered agreement-gated stress-test protocol for evaluating LLM-judged investment rationales before returns are observable. The protocol addresses the pre-realization evaluation problem in long-horizon investment decisions, where realized returns arrive too late and are too noisy. ValueAlpha uses an agreement gate to decide when LLM-judged claims are publishable, qualified, or invalid. In a controlled prototype with 1,000 honest decision cycles and 100 adversarial controls (1,100 trajectories, 5,500 judge calls), the aggregate agreement gate cleared at κ̄_w = 0.7168, but several overclaims were prevented. Lower-rank systems collapsed. The paper is available on arXiv.
Key facts
- ValueAlpha is a preregistered agreement-gated stress-test protocol.
- It evaluates LLM-judged investment rationales before returns are observable.
- The protocol addresses the pre-realization evaluation problem.
- In a controlled prototype, 1,000 honest decision cycles and 100 adversarial controls were used.
- The aggregate agreement gate cleared at κ̄_w = 0.7168.
- Several overclaims were prevented.
- Lower-rank systems collapsed.
- The paper is available on arXiv.
Entities
Institutions
- arXiv