LEGIT Dataset Evaluates LLM Legal Reasoning Traces
A team of researchers has launched LEGIT (LEGal Issue Trees), an extensive dataset comprising 24,000 instances of expert-level legal reasoning aimed at assessing the quality of reasoning traces produced by LLMs. This dataset transforms court rulings into structured trees that outline the arguments of opposing parties and the conclusions reached by the court, which act as criteria for evaluating both issue coverage and accuracy. The reliability of these criteria was confirmed through annotations by human experts. Findings revealed that the legal reasoning capabilities of LLMs are notably influenced by issue coverage and accuracy, with retrieval-augmented generation (RAG) and reinforcement learning utilizing rubrics offering additional enhancements.
Key facts
- LEGIT dataset contains 24,000 instances
- Dataset focuses on expert-level legal reasoning
- Court judgments are converted into hierarchical trees
- Trees include opposing parties' arguments and court conclusions
- Rubrics evaluate issue coverage and correctness
- Human expert annotations confirmed rubric reliability
- LLM reasoning affected by issue coverage and correctness
- RAG and RL with rubrics offer complementary benefits
Entities
—