ARTFEED — Contemporary Art Intelligence

RIFT Taxonomy Introduces Framework for Diagnosing Rubric Failures in LLM Evaluation

ai-technology · 2026-04-22

There's a new framework called RIFT, which stands for RubrIc Failure mode Taxonomy. Its purpose is to help identify and sort out failures in how we evaluate large language models using rubrics. RIFT sorts these failures into three main categories: Reliability Failures, Content Validity Failures, and Consequential Validity Failures, totaling eight distinct failure types. This tool fills a significant gap in the evaluation methods for these models, as previous approaches didn’t effectively diagnose rubric issues beyond basic results. Developed using grounded theory, RIFT was based on a thorough analysis of rubrics from five diverse datasets, including areas like instruction following and creative writing. You can find more details in the arXiv preprint 2604.01375v2.

Key facts

  • RIFT taxonomy categorizes eight rubric failure modes
  • Failure modes organized into three high-level categories: Reliability, Content Validity, Consequential Validity
  • Developed using grounded theory through iterative annotation
  • Based on rubrics from five diverse data sources
  • Addresses gap in diagnosing rubric failures from aggregated signals
  • Applies to LLM benchmarks and training pipelines for open-ended tasks
  • Covers domains: instruction following, code generation, creative writing, deep research
  • arXiv preprint identifier: 2604.01375v2

Entities

Institutions

  • arXiv

Sources