Vision Mamba Discretization Study Compares Six Schemes
A recent study looked at six different methods for discretizing the Vision Mamba state space model (SSM). It found that polynomial interpolation and higher-order hold techniques really boost accuracy in tasks like image classification, semantic segmentation, and object detection, even though they take longer to train. The bilinear/Tustin transform also shows consistent improvements. The research compared zero-order hold, first-order hold, bilinear/Tustin transform, polynomial interpolation, higher-order hold, and the fourth-order Runge-Kutta method against standard visual benchmarks. Interestingly, the default zero-order hold method was noted to hurt temporal accuracy in dynamic visual situations.
Key facts
- Six discretization schemes compared: ZOH, FOH, BIL, POL, HOH, RK4
- POL and HOH yield largest accuracy gains
- BIL provides consistent improvements
- ZOH degrades temporal fidelity in dynamic environments
- Evaluated on image classification, semantic segmentation, object detection
- Higher accuracy from POL and HOH comes with higher training-time computation
- Study is systematic and controlled within Vision Mamba framework
- Published on arXiv with ID 2604.20606
Entities
Institutions
- arXiv