Post-training reduces human-like behavior in LLMs

ai-technology · 2026-05-11

A new study introduces Psych-201, a dataset for measuring behavioral alignment between LLMs and humans. The research finds that post-training, which converts base models into assistants, consistently reduces alignment across model families and sizes. This misalignment increases in newer model generations. Persona-induction, a technique for eliciting human-like responses, does not improve individual-level predictions. The results suggest that current methods for making LLMs useful also make them less accurate models of human behavior.

Key facts

Psych-201 dataset introduced
Post-training reduces behavioral alignment
Misalignment widens in newer models
Persona-induction does not improve individual predictions
Study published on arXiv (2605.07632)

Entities

—

Sources

arXiv cs.AI — 2026-05-11