ARTFEED — Contemporary Art Intelligence

AI Framework Uses Street View to Assess Building Conditions Nationwide

ai-technology · 2026-04-25

A team of researchers has created a framework that utilizes multimodal large language models (LLMs) alongside Google Street View (GSV) images to assess building conditions throughout the United States automatically. By fine-tuning Gemma 3 27B with a small dataset labeled by humans, they achieved a strong correlation with human mean opinion scores (MOS), surpassing individual raters in SRCC and PLCC metrics. To enhance efficiency, knowledge distillation was employed to transfer skills to a smaller Gemma 3 4B model, which performed similarly with a threefold speed increase. Further distillation into a CNN-based EfficientNetV2-M and transformer SwinV2-B resulted in comparable performance with a 30x speed enhancement. The research also explores LLMs' capabilities in evaluating housing and built environment attributes, creating a visualization tool for the findings.

Key facts

  • Framework uses multimodal LLMs and Google Street View imagery
  • Fine-tuned Gemma 3 27B on human-labeled dataset
  • Outperforms individual raters on SRCC and PLCC relative to MOS benchmark
  • Knowledge distillation to Gemma 3 4B achieves 3x speedup
  • Further distillation to EfficientNetV2-M and SwinV2-B achieves 30x speed gain
  • Human-AI alignment study assesses built environment and housing attributes
  • Visualization tool developed for results
  • Published on arXiv under ID 2604.21102

Entities

Institutions

  • arXiv
  • Google

Locations

  • United States

Sources