Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models
A new state-space model (SSM) called Flash PD-SSM is proposed to address the trade-off between efficiency and expressivity. Unstructured transition matrices offer maximal expressivity but are computationally expensive, while structured matrices are efficient but limited. Flash PD-SSM uses a trainable set of structured sparse matrices, with one discretely selected per time step, achieving FSA expressiveness comparable to unstructured matrices while maintaining efficiency. The model builds on recent work on structured sparse SSMs and offers comparable throughput to widely-used structured SSMs with better expressivity guarantees.
Key facts
- Flash PD-SSM is a novel state-space model.
- It uses structured sparse matrices selected discretely per time step.
- Achieves FSA expressiveness at the level of unstructured matrices.
- Maintains efficiency comparable to widely-used structured SSMs.
- Addresses trade-off between efficiency and expressivity in SSMs.
- Builds on recent work on structured sparse SSMs.
- Unstructured matrices have high compute and memory cost.
- Structured matrices are efficient but limited in expressivity.
Entities
—