New RL Method Reduces Covert Political Bias in LLMs

ai-technology · 2026-05-23

A team of researchers has discovered a technique aimed at diminishing covert political bias in large language models (LLMs). Their findings reveal that LLMs display systematic political bias, particularly in sensitive contexts, where they treat topics from opposing political viewpoints unevenly—a situation referred to as covert political bias. They categorized this bias into seven operational techniques. To quantify it, they proposed two metrics: Sentiment Consistency, which assesses the symmetry of rhetoric across paired political prompts, and Helpfulness Consistency, which evaluates the depth and engagement of responses. To mitigate both biases, they created Political Consistency Training (PCT), a reinforcement learning method featuring two complementary approaches. Results indicate that PCT maintains overall helpfulness while significantly reducing covert political bias and generalizing to held-out benchmarks. This research is available on arXiv.

Key facts

LLMs exhibit systematic political bias across sensitive contexts
Covert political bias refers to asymmetric handling of counterpart topics from opposing political sides
7 categories of techniques for covert political bias identified
Sentiment Consistency metric measures symmetry in rhetoric and framing
Helpfulness Consistency metric measures symmetric depth and engagement
Political Consistency Training (PCT) is an RL training method
PCT includes Sentiment Consistency Training and Helpfulness Consistency Training
PCT preserves overall helpfulness and reduces covert political bias
PCT generalizes to held-out benchmarks
Work released on arXiv

New RL Method Reduces Covert Political Bias in LLMs

Key facts

Entities

Institutions

Sources