LLM Bias Study Reveals Gender, Racial, and Age Disparities in 2024 Models
An extensive evaluation of bias in four prominent large language models launched in 2024—Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o—uncovers ongoing disparities related to gender, race, and age within occupational and crime contexts. The research indicates that attempts to reduce bias often lead to new fairness dilemmas. In occupational contexts, these models represent female characters 37% more than males, diverging from data from the US Bureau of Labor Statistics. For crime contexts, the discrepancies from US FBI data are 54% for gender and 28% for race. The study, available on arXiv (2409.14583v4), highlights significant challenges regarding the usability, reliability, and fairness of LLMs as they increasingly impact critical decision-making.
Key facts
- Evaluated bias in Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o
- Gender bias assessed in occupational scenarios
- Gender, age, and racial bias assessed in crime scenarios
- 37% deviation from US BLS data in occupational gender depictions
- 54% deviation from US FBI data for gender in crime scenarios
- 28% deviation from US FBI data for race in crime scenarios
- Debiasing efforts create new fairness trade-offs
- Paper published on arXiv (2409.14583v4)
Entities
Institutions
- arXiv
- US Bureau of Labor Statistics
- US FBI