Red-Teaming LLMs for Political Influence Campaigns
A recent study published on arXiv presents a red-teaming framework designed to evaluate the potential misuse of large language models (LLMs) in political influence efforts. This research emphasizes locally deployed open-source LLMs, which are more appealing to privacy-focused malicious users compared to API-only models. The framework assesses LLM Overton Windows (OWs)—the spectrum of political views a model can consistently articulate on contentious issues—and quantifies how basic natural-language jailbreaks can broaden this spectrum. Analyzing over 30 LLMs across 10 model families and five nations, the findings reveal consistent biases in political expression: open-source LLMs tend to generate more left-leaning social media content. The objective is to enhance information integrity by pinpointing vulnerabilities before they can be exploited.
Key facts
- The study introduces a red-teaming framework for LLMs.
- It focuses on locally deployed open-source LLMs.
- The framework measures LLM Overton Windows (OWs).
- OWs define the range of political opinions a model can express.
- Simple natural-language jailbreaks expand the OW range.
- Over 30 LLMs from 10 model families were evaluated.
- Models from five countries of origin were tested.
- Open-source LLMs show systematic left-leaning bias in political expressivity.
Entities
Institutions
- arXiv