LEGIT Dataset Evaluates LLM Legal Reasoning Traces

ai-technology · 2026-05-04

A team of researchers has launched LEGIT (LEGal Issue Trees), an extensive dataset comprising 24,000 instances of expert-level legal reasoning aimed at assessing the quality of reasoning traces produced by LLMs. This dataset transforms court rulings into structured trees that outline the arguments of opposing parties and the conclusions reached by the court, which act as criteria for evaluating both issue coverage and accuracy. The reliability of these criteria was confirmed through annotations by human experts. Findings revealed that the legal reasoning capabilities of LLMs are notably influenced by issue coverage and accuracy, with retrieval-augmented generation (RAG) and reinforcement learning utilizing rubrics offering additional enhancements.

Key facts

LEGIT dataset contains 24,000 instances
Dataset focuses on expert-level legal reasoning
Court judgments are converted into hierarchical trees
Trees include opposing parties' arguments and court conclusions
Rubrics evaluate issue coverage and correctness
Human expert annotations confirmed rubric reliability
LLM reasoning affected by issue coverage and correctness
RAG and RL with rubrics offer complementary benefits

Entities

—

Sources

arXiv cs.AI — 2026-05-04