PlantInquiryVQA: Benchmark for Multimodal AI Reasoning in Botanical Diagnosis

ai-technology · 2026-04-25

PlantInquiryVQA has been developed by researchers as a benchmark to assess multimodal language models in the context of multi-step, intent-driven visual reasoning specifically within plant pathology. This study, which can be found on arXiv (2604.20983), fills a significant void in the evaluation of vision-language models, which often focus on single-turn question answering. The benchmark introduces a Chain of Inquiry framework that organizes diagnostic processes into sequential question-answer pairs based on visual cues and clear epistemic intent. It features a dataset comprising 24,950 expert-selected plant images and 138,068 question-answer pairs. This method reflects the way botanists analyze leaf images, recognize visual indicators, deduce diagnostic intent, and modify their inquiries according to species, symptoms, and severity, which is vital for precise disease diagnosis and treatment planning.

Key facts

PlantInquiryVQA is a new benchmark for multimodal AI reasoning in botanical diagnosis.
It addresses the gap in current vision-language model evaluations that use single-turn QA.
The benchmark uses a Chain of Inquiry framework for multi-step visual reasoning.
The dataset includes 24,950 expert-curated plant images.
The dataset includes 138,068 question-answer pairs.
The approach mimics botanists' adaptive questioning process.
The work is published on arXiv with ID 2604.20983.
The framework models diagnostic trajectories conditioned on visual cues and epistemic intent.

PlantInquiryVQA: Benchmark for Multimodal AI Reasoning in Botanical Diagnosis

Key facts

Entities

Institutions

Sources