Vision Mamba Discretization Study Compares Six Schemes

publication · 2026-04-24

A recent study looked at six different methods for discretizing the Vision Mamba state space model (SSM). It found that polynomial interpolation and higher-order hold techniques really boost accuracy in tasks like image classification, semantic segmentation, and object detection, even though they take longer to train. The bilinear/Tustin transform also shows consistent improvements. The research compared zero-order hold, first-order hold, bilinear/Tustin transform, polynomial interpolation, higher-order hold, and the fourth-order Runge-Kutta method against standard visual benchmarks. Interestingly, the default zero-order hold method was noted to hurt temporal accuracy in dynamic visual situations.

Key facts

Six discretization schemes compared: ZOH, FOH, BIL, POL, HOH, RK4
POL and HOH yield largest accuracy gains
BIL provides consistent improvements
ZOH degrades temporal fidelity in dynamic environments
Evaluated on image classification, semantic segmentation, object detection
Higher accuracy from POL and HOH comes with higher training-time computation
Study is systematic and controlled within Vision Mamba framework
Published on arXiv with ID 2604.20606

Vision Mamba Discretization Study Compares Six Schemes

Key facts

Entities

Institutions

Sources