PreFT: Prefill-only finetuning boosts multi-adapter LLM throughput
A new method called PreFT (Prefill-only Finetuning) improves inference throughput for serving multiple user-specific parameter-efficient finetuned (PEFT) large language models. Researchers identified a throughput mismatch between prefill and decode phases when handling multiple adapters. PreFT applies the adapter only during prefill and discards it for decode, significantly increasing throughput with minimal performance loss. The team released an efficient implementation of two prefill-only PEFTs. The work is described in arXiv:2605.14217.
Key facts
- PreFT stands for Prefill-only Finetuning.
- It addresses throughput issues in serving multiple user-specific PEFT adapters.
- The mismatch between prefill and decode phases harms throughput.
- PreFT applies adapter only to prefill tokens and discards it afterwards.
- It significantly increases throughput with minimal effect on performance.
- An efficient implementation of two prefill-only PEFTs was released.
- The research is published on arXiv with ID 2605.14217.
- The method optimizes for serving throughput rather than parameter count.
Entities
Institutions
- arXiv