InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training
A novel approach named InfiniPipe has been developed to enhance the training efficiency of large language models (LLMs) on lengthy context sequences through Elastic Pipeline Parallelism (EPP). Training with long contexts is crucial for broadening LLM functionalities; however, traditional methods like sequence parallelism are hindered by significant communication overhead. Although pipeline parallelism (PP) alleviates these costs, its success is influenced by the partitioning of sequences. Batch-level PP, which groups sequences, leads to excessive memory use for lengthy contexts, while token-level PP, which divides sequences, may not fully utilize hardware resources. Given that real-world datasets often have uneven sequence length distributions, fixed-granularity PP proves inadequate. InfiniPipe's EPP flexibly combines token-level and batch-level PP to respond to varying resources and workloads. Furthermore, Stage-Aware Chunk-Level Adaptive Checkpointing merges gradient checkpointing with EPP to minimize memory consumption. This research is documented on arXiv with ID 2509.21275.
Key facts
- InfiniPipe proposes Elastic Pipeline Parallelism (EPP) for LLM training.
- Long-context training is crucial for LLM context extension.
- Sequence parallelism incurs substantial communication overhead.
- Pipeline parallelism reduces communication cost.
- Batch-level PP has high memory consumption in long-context scenarios.
- Token-level PP may cause hardware under-utilization.
- Real-world datasets have skewed sequence length distributions.
- Stage-Aware Chunk-Level Adaptive Checkpointing integrates with EPP.
Entities
Institutions
- arXiv