Fairness Metric for Speech Emotion Recognition Targets Demographic Bias

ai-technology · 2026-04-24

A novel approach to fairness modeling in Speech Emotion Recognition (SER) systems effectively identifies allocative bias by analyzing the interconnectedness of demographic factors and model inaccuracies. This technique overcomes the shortcomings of conventional fairness metrics such as Equalised Odds and Demographic Parity, which fail to account for joint dependencies. Tested on synthetic data, the metric was used to assess HuBERT and WavLM models that were fine-tuned on the CREMA-D dataset. Findings indicate that the new model captures greater mutual information between protected attributes and biases, allowing for the quantification of contributions from individual attributes. The analysis also suggests the presence of gender bias in both the HuBERT and WavLM models.

Key facts

arXiv:2604.19763v1
Speech Emotion Recognition (SER) systems have applications in mental health and education
Traditional fairness metrics include Equalised Odds and Demographic Parity
Proposed fairness model captures joint relationship between demographic attributes and model error
Validated on synthetic data
Applied to HuBERT and WavLM models
Fine-tuned on CREMA-D dataset
Indications of gender bias found in both HuBERT and WavLM

Fairness Metric for Speech Emotion Recognition Targets Demographic Bias

Key facts

Entities

Institutions

Sources