Cognitive Reverse-Engineering Framework Decodes Jealousy in LLMs
A new study has unveiled a framework called Cognitive Reverse-Engineering, which relies on Representation Engineering (RepE) to explore how Large Language Models (LLMs) understand complex emotions, particularly social-comparison jealousy. This approach combines appraisal theory, subspace orthogonalization, regression-based weighting, and bidirectional causal steering to pinpoint and evaluate two psychological factors linked to jealousy: the Superiority of the Comparison Person and Domain Self-Definitional Relevance. Analysis of eight LLMs from the Llama, Qwen, and Gemma groups shows that these models naturally incorporate these cognitive aspects, influencing their assessments. This research addresses a gap in how we interpret these models, which are often seen as black boxes, overlooking the subtleties of intricate emotional experiences.
Key facts
- Framework is based on Representation Engineering (RepE)
- Analyzes social-comparison jealousy in LLMs
- Uses appraisal theory, subspace orthogonalization, regression-based weighting, and bidirectional causal steering
- Isolates two antecedents: Superiority of Comparison Person and Domain Self-Definitional Relevance
- Tested on eight LLMs from Llama, Qwen, and Gemma families
- Models natively encode these cognitive constructs
- Addresses gap in interpretability of complex emotions
- Published on arXiv with ID 2604.14593
Entities
Institutions
- arXiv