NDBench: Measuring How Frontier LLMs Adapt to Neurodivergence Context

ai-technology · 2026-05-04

Researchers have introduced NDBench, a benchmark consisting of 576 outputs aimed at assessing how cutting-edge chat-based LLMs modify their responses in relation to neurodivergence (ND) contexts within system prompts. This benchmark utilizes two advanced models, three types of system prompts (baseline, ND-profile assertion, and ND-profile assertion with explicit instructions), four established ND profiles, and 24 prompts categorized into four groups, including an adversarial masking strategy. Results indicate notable adjustments in ND contexts, with fully instructed scenarios producing longer, more organized outputs (increased token counts, more headings, detailed steps; p<10^-8, Holm-corrected). The adaptations are primarily structural, showing minimal changes in list density but a significant increase in structural elements. This research emphasizes the differences between surface-level and structural modifications in LLM responses to neurodivergent contexts.

Key facts

NDBench is a 576-output benchmark for LLM adaptation to neurodivergence context
Two frontier models tested
Three system prompt types: baseline, ND-profile assertion, and with explicit instructions
Four canonical ND profiles used
24 prompts across four categories including adversarial masking
Significant adaptation under ND context (p<10^-8, Holm-corrected)
Fully instructed conditions produce lengthier, more structured outputs
Adaptation is largely structural, not changing list density

NDBench: Measuring How Frontier LLMs Adapt to Neurodivergence Context

Key facts

Entities

Institutions

Sources