ARTFEED — Contemporary Art Intelligence

VLMs Suppress Female Representations Under Ambiguous Input

ai-technology · 2026-06-01

A recent investigation indicates that vision-language models (VLMs) tend to associate gender-ambiguous images with male identities, even in contexts involving occupations typically linked to women. The study presents LALS (Latent Association Learning Score), a novel zero-shot metric designed to assess internal concept associations by mapping visual-token activations into text-embedding space. Analyzing over 800 gender-ambiguous images across 15 occupations and four VLMs, researchers discovered a consistent disconnect: while models frequently encode female associations internally, they predominantly produce male outputs. This discrepancy underscores the inadequacy of alignment techniques when dealing with ambiguous inputs that are prevalent in real-world scenarios.

Key facts

  • VLMs default to male associations for gender-ambiguous images
  • Even female-stereotyped occupations trigger male defaults
  • LALS metric measures internal concept associations per token and layer
  • Study tested 15 occupations, over 800 images, and four VLMs
  • Internal representations and outputs are systematically decoupled
  • Models often encode female associations internally but output male
  • Minimal prompting pressure exposes occupation-gender defaults
  • Ambiguous inputs are common in practice yet rarely studied

Entities

Institutions

  • arXiv

Sources