ARTFEED — Contemporary Art Intelligence

New AI Research Improves Training Time Prediction for Mixed Precision Distributed Deep Learning

ai-technology · 2026-04-20

A new research paper addresses a critical gap in predicting training time for distributed deep learning systems. The study finds that floating-point precision settings significantly impact training duration, causing variations of approximately 2.4 times the minimum time. Existing prediction methods fail because they rely on static model computation graphs that don't account for precision variations, including mixed precision approaches. Experiments show these traditional methods produce substantial errors, with mean absolute percentage error reaching up to 147.85%. To solve this problem, researchers developed a precision-aware predictor that delivers robust accuracy across diverse precision settings. This new tool achieves a much lower error rate of 9.8% MAPE. Accurate training time prediction is essential for efficient resource allocation, cost estimation, and job scheduling in distributed deep learning environments. The research was published on arXiv, a platform for scientific papers in fields including computer science and machine learning.

Key facts

  • Training time in distributed deep learning varies by ~2.4x based on floating-point precision settings
  • Existing prediction methods don't account for precision variations including mixed precision
  • Traditional methods produce prediction errors up to 147.85% MAPE
  • Researchers developed a precision-aware distributed training time predictor
  • New predictor achieves 9.8% MAPE across diverse precision settings
  • Accurate prediction is crucial for resource allocation and cost estimation
  • Research addresses distributed deep learning training optimization
  • Paper published on arXiv platform

Entities

Institutions

  • arXiv

Sources