ARTFEED — Contemporary Art Intelligence

Vision Mamba Discretization Study Compares Six Schemes

publication · 2026-04-24

A recent study looked at six different methods for discretizing the Vision Mamba state space model (SSM). It found that polynomial interpolation and higher-order hold techniques really boost accuracy in tasks like image classification, semantic segmentation, and object detection, even though they take longer to train. The bilinear/Tustin transform also shows consistent improvements. The research compared zero-order hold, first-order hold, bilinear/Tustin transform, polynomial interpolation, higher-order hold, and the fourth-order Runge-Kutta method against standard visual benchmarks. Interestingly, the default zero-order hold method was noted to hurt temporal accuracy in dynamic visual situations.

Key facts

  • Six discretization schemes compared: ZOH, FOH, BIL, POL, HOH, RK4
  • POL and HOH yield largest accuracy gains
  • BIL provides consistent improvements
  • ZOH degrades temporal fidelity in dynamic environments
  • Evaluated on image classification, semantic segmentation, object detection
  • Higher accuracy from POL and HOH comes with higher training-time computation
  • Study is systematic and controlled within Vision Mamba framework
  • Published on arXiv with ID 2604.20606

Entities

Institutions

  • arXiv

Sources