Post-training reduces human-like behavior in LLMs
A new study introduces Psych-201, a dataset for measuring behavioral alignment between LLMs and humans. The research finds that post-training, which converts base models into assistants, consistently reduces alignment across model families and sizes. This misalignment increases in newer model generations. Persona-induction, a technique for eliciting human-like responses, does not improve individual-level predictions. The results suggest that current methods for making LLMs useful also make them less accurate models of human behavior.
Key facts
- Psych-201 dataset introduced
- Post-training reduces behavioral alignment
- Misalignment widens in newer models
- Persona-induction does not improve individual predictions
- Study published on arXiv (2605.07632)
Entities
—