ARTFEED — Contemporary Art Intelligence

LLMs Show Self-Preference Bias in Assessing Idea Originality

ai-technology · 2026-04-25

A recent investigation available on arXiv examines the alignment of Large Language Models (LLMs) with human evaluators in judging the originality of responses during a divergent thinking exercise. The study involved an analysis of 4,813 submissions to the Alternate Uses Task (AUT) from both highly creative and less creative individuals, as well as ChatGPT-4o. Two university students, who received extensive training, served as human raters. The machine evaluation was conducted using two tailored systems, OCSAI and CLAUS, along with ChatGPT-4o, which followed the same instructions as the human raters. The results indicate a preliminary self-preference bias in automatic systems, favoring responses aligned with their own style, underscoring the necessity for careful calibration in employing LLMs for assessing creativity.

Key facts

  • Study investigates LLM alignment with human raters on originality assessment
  • 4,813 responses to Alternate Uses Task analyzed
  • Responses from higher and lower creative humans and ChatGPT-4o
  • Human raters: two university students with intensive training
  • Machine raters: OCSAI, CLAUS, and ChatGPT-4o
  • Preliminary evidence of self-preference bias in automatic systems
  • Automatic systems prefer outcomes related to their own style
  • Potential solution to cost, fatigue, and subjectivity but with bias

Entities

Institutions

  • arXiv

Sources