ARTFEED — Contemporary Art Intelligence

Computer Vision System Achieves Sub-Millimeter Arrow Localization Using Frozen Vision Transformers

ai-technology · 2026-04-22

A novel computer vision system has been developed for precisely detecting and scoring arrow punctures on 40 cm indoor archery targets. The method requires only 48 annotated photographs containing 5,084 punctures for training. It employs a frozen self-supervised vision transformer, specifically DINOv3 ViT-L/16, combined with AnyUp guided feature upsampling to achieve sub-millimeter spatial precision from 32x32 patch tokens. A color-based canonical rectification stage first standardizes perspective-distorted photographs into a coordinate system where pixel distances correspond to known physical measurements. Lightweight CenterNet-style detection heads then predict arrow-center heatmaps. Remarkably, only 3.8 million of the model's total 308 million parameters are trainable. In cross-validation across three folds, the system achieved a mean F1 score of 0.893 ± 0.011 and a mean localization error of 1.41 ± 0.06 mm. This performance is comparable to or surpasses prior fully-supervised approaches. The research demonstrates the efficacy of leveraging large, pre-trained, frozen vision transformers for dense prediction tasks on extremely small datasets. The technical paper detailing this system is available on arXiv under the identifier 2604.16758v1.

Key facts

  • System automates detection, localization, and scoring of arrow punctures on 40 cm archery targets.
  • Trained on only 48 annotated photographs containing 5,084 punctures.
  • Uses a frozen DINOv3 ViT-L/16 vision transformer with AnyUp guided feature upsampling.
  • Achieves sub-millimeter spatial precision from 32x32 patch tokens.
  • Includes a color-based canonical rectification stage for standardizing images.
  • Employs lightweight CenterNet-style detection heads for heatmap prediction.
  • Only 3.8 million of 308 million total model parameters are trainable.
  • Achieved mean F1 score of 0.893 ± 0.011 and mean localization error of 1.41 ± 0.06 mm.

Entities

Institutions

  • arXiv

Sources