VLA Training Datasets Show Limited Linguistic Diversity

ai-technology · 2026-04-30

A systematic audit of widely used Vision-Language-Action (VLA) datasets reveals that many rely on repetitive, template-like commands with limited structural variation. The study, published on arXiv (2601.03136), quantifies instruction language along lexical variety, duplication, overlap, semantic similarity, and syntactic complexity. Findings indicate a narrow distribution of instruction forms, which may impact the robustness of embodied AI systems. The authors position this as descriptive documentation to support more detailed dataset reporting.

Key facts

arXiv paper 2601.03136 audits VLA datasets.
Analysis covers lexical variety, duplication, overlap, semantic similarity, syntactic complexity.
Many datasets use repetitive, template-like commands.
Limited structural variation in instructions.
Findings intended to support better dataset reporting.

VLA Training Datasets Show Limited Linguistic Diversity

Key facts

Entities

Institutions

Sources