SST Activation Improves GRU Performance in Low-Data Settings
Researchers introduce squared sigmoid-tanh (SST), a parameter-free activation function designed to enhance gate separation in gated recurrent units (GRUs). SST squares the gate nonlinearity, increasing contrast between near-zero and high activations for sharper information filtering. Evaluated across low-data tasks including sign language recognition, human activity recognition, and time-series forecasting/classification, SST-GRU consistently outperforms standard sigmoid/tanh GRU, with largest gains in smallest-data domains. The method adds negligible computational cost.
Key facts
- SST is a parameter-free activation function for GRUs.
- SST squares the gate nonlinearity to increase contrast.
- SST-GRU evaluated on sign language recognition, human activity recognition, time-series forecasting and classification.
- SST-GRU consistently outperforms standard sigmoid/tanh GRU.
- Largest improvements observed in smallest-data domains.
- SST adds negligible computational cost.
- Standard sigmoid and tanh can produce weak gate separation and unstable learning with limited data.
- The paper is arXiv:2402.09034.
Entities
—