Curator-Guided Multilingual Art Description for BLV Audiences Using Small VLMs

publication · 2026-06-01

A preliminary investigation examines the use of curator-led multilingual art descriptions for blind and low-vision (BLV) individuals through Qwen2.5-VL-3B-Instruct, a compact vision-language model (VLM). This research, available on arXiv, tackles the challenge of providing accessible art descriptions in various languages, particularly in museums where privacy and intellectual property issues favor on-site models. The project develops a parallel caption corpus focused on BLV needs, utilizing artwork images and metadata in German, Romanian, and Serbian. It evaluates language-specific LoRA adapters against a singular multilingual adapter within a set training budget. Findings indicate that language-specific adapters yield better controllability and description quality for Romanian and Serbian, while the multilingual approach is effective for German. The study underscores the promise of small VLMs for enhancing art accessibility in multilingual museum environments.

Key facts

Study uses Qwen2.5-VL-3B-Instruct for art description.
Languages covered: German, Romanian, Serbian.
Constructs a parallel BLV-oriented caption corpus.
Compares language-specific LoRA adapters vs. single multilingual adapter.
Evaluation includes LLM-as-Judge protocol calibrated with Romanian BLV pilot.
Language-specific adapters perform better for Romanian and Serbian.
Multilingual adapter remains competitive for German.
Published on arXiv with ID 2605.31080.

Curator-Guided Multilingual Art Description for BLV Audiences Using Small VLMs

Key facts

Entities

Institutions

Sources