MESD: New Metric Detects Unfair Explanations Across Intersectional Groups

ai-technology · 2026-05-18

A novel metric named Multi-category Explanation Stability Disparity (MESD) has been developed to assess the unfairness present in machine learning explanations among various intersectional subgroups. Unlike conventional outcome-focused metrics such as demographic parity, which merely evaluate the statistical consistency of predictions among protected groups, MESD emphasizes procedural fairness by analyzing differences in the quality of explanations. It tackles the issue of fairness gerrymandering, where models may seem equitable based on individual characteristics (e.g., race) yet reveal substantial disparities for intersectional subgroups (e.g., race × gender). MESD incorporates three elements to measure explanation stability across subgroups derived from the Cartesian product of multiple protected attributes. This metric is elaborated in a paper available on arXiv (2603.13452).

Key facts

MESD stands for Multi-category Explanation Stability Disparity.
It is a procedural fairness metric for machine learning.
It detects disparities in explanation quality across intersectional subgroups.
Intersectional subgroups are formed by Cartesian product of protected attributes.
Addresses fairness gerrymandering where models appear fair on single attributes.
Traditional metrics like demographic parity only check outcome consistency.
The paper is available on arXiv with ID 2603.13452.
MESD integrates three components to measure explanation stability.

MESD: New Metric Detects Unfair Explanations Across Intersectional Groups

Key facts

Entities

Institutions

Sources