LLM Bias Study Reveals Gender, Racial, and Age Disparities in 2024 Models

ai-technology · 2026-06-01

An extensive evaluation of bias in four prominent large language models launched in 2024—Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o—uncovers ongoing disparities related to gender, race, and age within occupational and crime contexts. The research indicates that attempts to reduce bias often lead to new fairness dilemmas. In occupational contexts, these models represent female characters 37% more than males, diverging from data from the US Bureau of Labor Statistics. For crime contexts, the discrepancies from US FBI data are 54% for gender and 28% for race. The study, available on arXiv (2409.14583v4), highlights significant challenges regarding the usability, reliability, and fairness of LLMs as they increasingly impact critical decision-making.

Key facts

Evaluated bias in Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o
Gender bias assessed in occupational scenarios
Gender, age, and racial bias assessed in crime scenarios
37% deviation from US BLS data in occupational gender depictions
54% deviation from US FBI data for gender in crime scenarios
28% deviation from US FBI data for race in crime scenarios
Debiasing efforts create new fairness trade-offs
Paper published on arXiv (2409.14583v4)

LLM Bias Study Reveals Gender, Racial, and Age Disparities in 2024 Models

Key facts

Entities

Institutions

Sources