Scaling Laws Show Equivariance Matters More at Larger Scales
A study on neural force fields reveals that equivariant architectures, which leverage symmetry, scale better than non-equivariant models. The research shows power-law scaling behavior with architecture-dependent exponents, and higher-order representations yield better scaling. For compute-optimal training, data and model sizes should scale together regardless of architecture. The findings challenge the belief that models should discover inductive biases like symmetry on their own.
Key facts
- Equivariance matters more at larger scales
- Power-law scaling with architecture-dependent exponents
- Equivariant architectures scale better than non-equivariant
- Higher-order representations improve scaling exponents
- Data and model sizes should scale in tandem for compute-optimal training
- Contrary to common belief, symmetry should not be left to the model to discover
Entities
Institutions
- arXiv