LATTICE: New Benchmark for Crypto Agent Decision Support
A new benchmark named LATTICE has been developed by researchers to assess how effectively crypto agents support decision-making in real-world user scenarios. Unlike earlier benchmarks that emphasize reasoning or outcomes, LATTICE focuses on the agents' capabilities to aid users in making decisions. It outlines six evaluation dimensions that highlight essential decision support features and introduces 16 task types that cover the complete crypto copilot workflow. Utilizing LLM judges, the benchmark automatically evaluates agent outputs. This framework allows for scalable assessments without needing expert annotations or external data. Furthermore, the LLM judge rubrics can be regularly revised and enhanced with additional dimensions, tasks, criteria, and user feedback, ensuring a robust and adaptable evaluation process.
Key facts
- LATTICE is a benchmark for evaluating crypto agent decision support utility.
- It defines six evaluation dimensions for decision support properties.
- It proposes 16 task types covering the crypto copilot workflow.
- It uses LLM judges for automatic scoring of agent outputs.
- Evaluation does not rely on expert annotators or external data.
- LLM judge rubrics can be updated with new dimensions and human feedback.
- Prior benchmarks focused on reasoning or outcome-based evaluation.
- LATTICE addresses the gap in assessing user decision-making assistance.
Entities
—