ARTFEED — Contemporary Art Intelligence

LLM-Generated Code Harbors Severe Social Bias, Study Finds

ai-technology · 2026-05-04

A new study reveals that large language models (LLMs) produce code with significant social bias, with Code Bias Scores reaching up to 60.58%. The research, extending prior work on Solar, introduces SocialBias-Bench, a benchmark of 343 real-world coding tasks across seven demographic dimensions. Evaluating four prominent LLMs, the study found severe bias in all models. Standard prompt-level interventions like Chain-of-Thought reasoning and fairness persona assignment inadvertently amplified bias. Structured multi-agent software process frameworks showed promise in reducing bias when early roles correctly scoped tasks. The findings highlight a critical gap in existing evaluations that focus on functional correctness while ignoring fairness in code generation for human-centered applications.

Key facts

  • Study uses SocialBias-Bench benchmark with 343 coding tasks
  • Seven demographic dimensions evaluated
  • Code Bias Scores up to 60.58% across four LLMs
  • Chain-of-Thought and fairness persona amplify bias
  • Structured multi-agent pipelines reduce bias when early roles scope correctly
  • Extends prior work on Solar
  • Focus on human-centered applications where demographic fairness is critical
  • Existing evaluations focus on functional correctness, not social bias

Entities

Sources