Power-law data distribution boosts AI compositional reasoning
A recent study published on arXiv indicates that training AI models using natural language data that follows a power-law distribution—characterized by infrequent occurrences of most knowledge—yields better results than training with a uniform distribution for tasks involving compositional reasoning, such as state tracking and multi-step arithmetic. The authors present a minimalist skill-composition task to illustrate that power-law sampling requires significantly less training data. Their theoretical findings suggest that this distribution creates a beneficial asymmetry in the loss landscape, allowing models to initially grasp high-frequency skill compositions with lower data complexity, which subsequently facilitates the learning of less common skills. The paper, titled "The Power of Power Law: Asymmetry Enables Compositional Reasoning," was submitted to arXiv on April 26, 2025.
Key facts
- arXiv paper ID: 2604.22951
- Published: April 26, 2025
- Power-law distribution outperforms uniform distribution for compositional reasoning
- Tasks tested: state tracking, multi-step arithmetic
- Power-law sampling requires less training data
- Beneficial asymmetry in loss landscape is key mechanism
- Minimalist skill-composition task used for theoretical proof
- Natural language data follows power-law distribution
Entities
Institutions
- arXiv