VLA Model for Adaptive Ultrasound-Guided Needle Insertion and Tracking

other · 2026-04-24

A new model called Vision-Language-Action (VLA) has been introduced for the automated and adaptive insertion and tracking of ultrasound-guided needles within a robotic ultrasound (RUS) system. This framework combines the control of needle insertion with tracking, allowing for real-time adjustments that adapt dynamically to the needle's location and the surrounding environment. Utilizing a Cross-Depth Fusion (CDF) tracking head, it merges shallow positional data with deep semantic features from an extensive vision backbone for seamless tracking. This method overcomes the shortcomings of traditional modular controllers that perform poorly in difficult scenarios. The research is available on arXiv under ID 2604.20347.

Key facts

VLA model proposed for adaptive US-guided needle insertion and tracking
Framework unifies needle tracking and insertion control
Enables real-time, dynamically adaptive adjustment
Cross-Depth Fusion (CDF) tracking head integrates shallow and deep features
Addresses performance degradation of modular controllers
Published on arXiv with ID 2604.20347
Robotic ultrasound (RUS) system used
End-to-end tracking achieved

VLA Model for Adaptive Ultrasound-Guided Needle Insertion and Tracking

Key facts

Entities

Institutions

Sources