ARTFEED — Contemporary Art Intelligence

GOLD-BEV Framework Uses Aerial Imagery to Train Dense Bird's-Eye-View Semantic Mapping for Dynamic Scenes

ai-technology · 2026-04-22

A novel research framework, named GOLD-BEV, has been developed to generate dense semantic maps of road environments from an ego-centric viewpoint. This approach utilizes bird's-eye-view representations to analyze dynamic scenes while maintaining geometric consistency, which is vital for planning and mapping applications. The training process leverages time-synchronized aerial imagery for supervision, eliminating the need for extensive manual labeling. This synchronization effectively tackles issues with moving traffic participants and mitigates temporal inconsistencies from non-synchronized overhead sources. BEV-aligned aerial crops provide an intuitive target space, facilitating dense semantic annotation with minimal effort and avoiding ambiguities linked to ego-only BEV labeling. The methodology generates scalable dense targets by creating BEV pseudo-labels through domain-adapted aerial teachers. This framework allows for joint training of BEV segmentation and optionally includes pseudo-aerial BEV reconstruction for enhanced interpretability. The findings are detailed in the preprint arXiv:2604.19411v1, which was released as a cross submission. The study emphasizes learning from ego-centric sensors to create detailed environment maps that encompass dynamic agents.

Key facts

  • GOLD-BEV is a framework for dense bird's-eye-view semantic mapping of dynamic scenes
  • It uses time-synchronized aerial imagery as supervision during training
  • BEV-aligned aerial crops provide an intuitive target space for dense semantic annotation
  • Strict aerial-ground synchronization supervises moving traffic participants and mitigates temporal inconsistencies
  • The method generates BEV pseudo-labels using domain-adapted aerial teachers
  • It jointly trains BEV segmentation with optional pseudo-aerial BEV reconstruction for interpretability
  • The research is presented in the preprint arXiv:2604.19411v1
  • The framework learns from ego-centric sensors to create environment maps including dynamic agents

Entities

Sources