VLA Training Datasets Show Limited Linguistic Diversity
A systematic audit of widely used Vision-Language-Action (VLA) datasets reveals that many rely on repetitive, template-like commands with limited structural variation. The study, published on arXiv (2601.03136), quantifies instruction language along lexical variety, duplication, overlap, semantic similarity, and syntactic complexity. Findings indicate a narrow distribution of instruction forms, which may impact the robustness of embodied AI systems. The authors position this as descriptive documentation to support more detailed dataset reporting.
Key facts
- arXiv paper 2601.03136 audits VLA datasets.
- Analysis covers lexical variety, duplication, overlap, semantic similarity, syntactic complexity.
- Many datasets use repetitive, template-like commands.
- Limited structural variation in instructions.
- Findings intended to support better dataset reporting.
Entities
Institutions
- arXiv