ARTFEED — Contemporary Art Intelligence

SSMProbe: Probing Visual Models via Token Order Dynamics

publication · 2026-05-06

A new research paper on arXiv (2605.00915) introduces SSMProbe, a probing framework that uses State Space Models (SSMs) to exploit token order in frozen visual representations. Standard methods like Global Average Pooling (GAP) or CLS tokens treat patch representations as permutation-invariant, ignoring sequence structure. The authors challenge this by showing that token order is a critical dimension in models such as MAE, BEiT, DINOv2, and ViT. SSMProbe operates as discrete Linear Time-Invariant (LTI) dynamical systems, where sequence order dictates the final state due to memory decay. The framework formulates token ordering as an information scheduling problem, comparing fixed scan heuristics against a differentiable soft permutation learned via Sinkhorn-based supervision. Evaluations on standard and fine-grained classification benchmarks reveal improved probing performance.

Key facts

  • Paper arXiv:2605.00915 introduces SSMProbe probing framework.
  • SSMProbe uses State Space Models (SSMs) as LTI dynamical systems.
  • Token order is exploited in frozen visual representations (MAE, BEiT, DINOv2, ViT).
  • Standard methods (GAP, CLS) are permutation-invariant.
  • Token ordering treated as information scheduling problem.
  • Fixed scan heuristics compared with differentiable soft permutation (Sinkhorn).
  • Evaluated on standard and fine-grained classification benchmarks.

Entities

Institutions

  • arXiv

Sources