MR2-ByteTrack: Efficient Video Object Detection for MCUs
A new method called Multi-Resolution Rescored ByteTrack (MR2-ByteTrack) enables video object detection on ultra-low-power microcontrollers (MCUs) by alternating full- and low-resolution inference, linking detections via ByteTrack, and correcting misclassifications with a Rescore algorithm. This approach addresses the impracticality of cloud computing for smart vision sensors due to bandwidth, latency, and privacy constraints, while overcoming memory and compute limitations of MCUs. The method is applied to both CNN and Transformer-based detectors.
Key facts
- MR2-ByteTrack is a Video Object Detection method for MCU-based embedded vision nodes.
- It reduces computational cost by alternating between full- and low-resolution inference.
- Detections are linked across frames using ByteTrack.
- Misclassifications are corrected via the Rescore algorithm using probability union rules.
- The method is applied to both CNN and Transformer-based detectors.
- Cloud computing is often impractical for smart vision sensors due to bandwidth, latency, and privacy constraints.
- MCUs have limited memory and compute, making conventional VOD methods unfeasible.
- The paper is published on arXiv with ID 2605.15423.
Entities
Institutions
- arXiv