Fairness Metric for Speech Emotion Recognition Targets Demographic Bias
A novel approach to fairness modeling in Speech Emotion Recognition (SER) systems effectively identifies allocative bias by analyzing the interconnectedness of demographic factors and model inaccuracies. This technique overcomes the shortcomings of conventional fairness metrics such as Equalised Odds and Demographic Parity, which fail to account for joint dependencies. Tested on synthetic data, the metric was used to assess HuBERT and WavLM models that were fine-tuned on the CREMA-D dataset. Findings indicate that the new model captures greater mutual information between protected attributes and biases, allowing for the quantification of contributions from individual attributes. The analysis also suggests the presence of gender bias in both the HuBERT and WavLM models.
Key facts
- arXiv:2604.19763v1
- Speech Emotion Recognition (SER) systems have applications in mental health and education
- Traditional fairness metrics include Equalised Odds and Demographic Parity
- Proposed fairness model captures joint relationship between demographic attributes and model error
- Validated on synthetic data
- Applied to HuBERT and WavLM models
- Fine-tuned on CREMA-D dataset
- Indications of gender bias found in both HuBERT and WavLM
Entities
Institutions
- arXiv
- CREMA-D