DisaBench: Evaluating Disability Harms in Language Models

ai-technology · 2026-05-14

DisaBench has been launched by researchers as a framework for participatory evaluation aimed at identifying disability-related harms in large language models. Developed in collaboration with individuals with disabilities and red teaming specialists, it outlines twelve categories of disability harm and matches benign with adversarial prompts across seven life domains. The dataset comprises 175 prompts, featuring human-annotated labels for 525 prompt-response pairs. Evaluation by four annotators with personal disability experiences revealed significant variations in harm rates based on disability type, compounded effects in non-text modalities, and that terminology-driven harm is influenced by cultural and temporal contexts. The framework highlights that disability harm is inherently personal, intersectional, and defined by communities.

Key facts

DisaBench is a participatory evaluation framework for disability harms in LLMs.
Co-created with people with disabilities and red teaming experts.
Defines twelve disability harm categories.
Pairs benign and adversarial prompts across seven life domains.
Dataset includes 175 prompts with 525 annotated prompt-response pairs.
Four evaluators with lived disability experience conducted annotations.
Harm rates vary sharply by disability type.
Standard safety evaluations miss subtle harms only domain expertise can recognize.

DisaBench: Evaluating Disability Harms in Language Models

Key facts

Entities

Institutions

Sources