AI Framework Uses Street View to Assess Building Conditions Nationwide

ai-technology · 2026-04-25

A team of researchers has created a framework that utilizes multimodal large language models (LLMs) alongside Google Street View (GSV) images to assess building conditions throughout the United States automatically. By fine-tuning Gemma 3 27B with a small dataset labeled by humans, they achieved a strong correlation with human mean opinion scores (MOS), surpassing individual raters in SRCC and PLCC metrics. To enhance efficiency, knowledge distillation was employed to transfer skills to a smaller Gemma 3 4B model, which performed similarly with a threefold speed increase. Further distillation into a CNN-based EfficientNetV2-M and transformer SwinV2-B resulted in comparable performance with a 30x speed enhancement. The research also explores LLMs' capabilities in evaluating housing and built environment attributes, creating a visualization tool for the findings.

Key facts

Framework uses multimodal LLMs and Google Street View imagery
Fine-tuned Gemma 3 27B on human-labeled dataset
Outperforms individual raters on SRCC and PLCC relative to MOS benchmark
Knowledge distillation to Gemma 3 4B achieves 3x speedup
Further distillation to EfficientNetV2-M and SwinV2-B achieves 30x speed gain
Human-AI alignment study assesses built environment and housing attributes
Visualization tool developed for results
Published on arXiv under ID 2604.21102

Entities

Institutions

arXiv
Google

Locations

United States

Sources

arXiv cs.AI — 2026-04-25