AI Models Show Uneven Cognitive Abilities Across Generations

ai-technology · 2026-05-11

A recent study presents a psychometric framework designed to assess the cognitive abilities of generative AI models, benchmarking them against human standards and monitoring their progress over generations. By utilizing tasks from the Wechsler Adult Intelligence Scale, the researchers discovered that top multimodal models excel in verbal comprehension and working memory, scoring above the 98th percentile, while they performed poorly in perceptual reasoning, falling below the 1st percentile. To measure advancements beyond human capabilities, the team established the Artificial Intelligence Quotient (AIQ) Benchmark, which was applied to six generations across two model families, uncovering notable yet uneven performance improvements.

Key facts

Study introduces psychometric framework for evaluating generative AI cognition.
Models tested using Wechsler Adult Intelligence Scale tasks.
Near-ceiling performance in verbal comprehension and working memory (>98th percentile).
Near-floor performance in perceptual reasoning (<1st percentile).
AIQ Benchmark developed to track AI cognitive evolution.
Applied to six generations and two model families.
Performance gains are significant but asymmetric.
Research aims to advance artificial general intelligence evaluation.

Entities

—

Sources

arXiv cs.AI — 2026-05-11