Scaling Humanoid Motion Tracking with 42M Parameters and 700 Hours of MoCap Data
Researchers present SONIC, a foundation model for humanoid whole-body motion tracking that scales network size, dataset volume, and compute. The model, ranging from 1.2M to 42M parameters, is trained on over 100 million frames from 700 hours of motion capture data using 21,000 GPU hours. This scaling yields a generalist controller capable of natural, robust movements without manual reward engineering. The work demonstrates that scaling laws apply to humanoid control, enabling real-time applications.
Key facts
- SONIC scales model capacity from 1.2M to 42M parameters
- Trained on 100M+ frames from 700 hours of motion capture
- Uses 21,000 GPU hours of compute
- Achieves natural whole-body movements without manual reward engineering
- Demonstrates scaling benefits for humanoid control
- Enables real-time motion tracking
- Published on arXiv under ID 2511.07820
- Replaces cross with replace-cross announcement type
Entities
Institutions
- arXiv