SST Activation Improves GRU Performance in Low-Data Settings

other · 2026-04-30

Researchers introduce squared sigmoid-tanh (SST), a parameter-free activation function designed to enhance gate separation in gated recurrent units (GRUs). SST squares the gate nonlinearity, increasing contrast between near-zero and high activations for sharper information filtering. Evaluated across low-data tasks including sign language recognition, human activity recognition, and time-series forecasting/classification, SST-GRU consistently outperforms standard sigmoid/tanh GRU, with largest gains in smallest-data domains. The method adds negligible computational cost.

Key facts

SST is a parameter-free activation function for GRUs.
SST squares the gate nonlinearity to increase contrast.
SST-GRU evaluated on sign language recognition, human activity recognition, time-series forecasting and classification.
SST-GRU consistently outperforms standard sigmoid/tanh GRU.
Largest improvements observed in smallest-data domains.
SST adds negligible computational cost.
Standard sigmoid and tanh can produce weak gate separation and unstable learning with limited data.
The paper is arXiv:2402.09034.

Entities

—

Sources

arXiv cs.AI — 2026-04-29