LP-Eval: A Rubric and Dataset for Legal Proposition Generation from EU Court Decisions
A recent study presents LP-Eval, a framework and dataset aimed at assessing the quality of legal propositions produced by large language models (LLMs) based on rulings from the Court of Justice of the European Union. Developed in collaboration with legal professionals, the framework breaks down proposition quality into formal validity and substantive aspects. The dataset features annotations from two experts for 100 propositions generated by LLMs. Findings reveal that LLMs primarily generate well-structured, high-quality propositions, showing better quality in established cases compared to newer ones. Evaluations of LLMs guided by the rubric are more consistent with expert judgments than those done directly.
Key facts
- LP-Eval is a three-step evaluation rubric for legal proposition generation.
- The rubric was co-designed with legal experts.
- It decomposes quality into formal validity and substantive dimensions.
- A dataset of two experts' annotations for 100 LLM-generated propositions is released.
- LLMs generate predominantly well-formed and high-quality propositions.
- Higher quality propositions come from well-established cases than recent ones.
- Rubric-guided LLM judgments align more closely with expert assessments than direct evaluations.
- The research focuses on decisions of the Court of Justice of the European Union.
Entities
Institutions
- Court of Justice of the European Union
Locations
- European Union