Red-Teaming LLMs for Political Influence Campaigns

ai-technology · 2026-05-25

A recent study published on arXiv presents a red-teaming framework designed to evaluate the potential misuse of large language models (LLMs) in political influence efforts. This research emphasizes locally deployed open-source LLMs, which are more appealing to privacy-focused malicious users compared to API-only models. The framework assesses LLM Overton Windows (OWs)—the spectrum of political views a model can consistently articulate on contentious issues—and quantifies how basic natural-language jailbreaks can broaden this spectrum. Analyzing over 30 LLMs across 10 model families and five nations, the findings reveal consistent biases in political expression: open-source LLMs tend to generate more left-leaning social media content. The objective is to enhance information integrity by pinpointing vulnerabilities before they can be exploited.

Key facts

The study introduces a red-teaming framework for LLMs.
It focuses on locally deployed open-source LLMs.
The framework measures LLM Overton Windows (OWs).
OWs define the range of political opinions a model can express.
Simple natural-language jailbreaks expand the OW range.
Over 30 LLMs from 10 model families were evaluated.
Models from five countries of origin were tested.
Open-source LLMs show systematic left-leaning bias in political expressivity.

Red-Teaming LLMs for Political Influence Campaigns

Key facts

Entities

Institutions

Sources