PATCH: A Hybrid Sparsity Framework for Efficient LLMs
PATCH introduces a learnable tile-level hybrid sparsity framework for large language models (LLMs), enabling a continuous sparsity ratio between 0% and 50%. It partitions weight matrices into tiles, each assigned as dense or 2:4 sparse via a learnable mask selection mechanism, offering fine-grained control over accuracy-acceleration tradeoffs and non-uniform sparsity across layers. This bridges the gap between unstructured sparsity (accurate but irregular) and semi-structured 2:4 sparsity (hardware-friendly but rigid), achieving superior overall quality. The paper is available on arXiv under ID 2509.23410.
Key facts
- PATCH enables continuous sparsity ratio between 0% and 50%
- Partitions weight matrices into tiles with learnable mask selection
- Supports non-uniform sparsity across layers
- Bridges unstructured and semi-structured 2:4 sparsity
- arXiv paper ID: 2509.23410
- Published on arXiv
- Announce type: replace-cross
Entities
Institutions
- arXiv