CoLLM Framework Unifies Federated Fine-Tuning and Inference for Edge LLMs
A novel framework named CoLLM has been launched to enhance the deployment of Large Language Models at the edge by addressing existing inefficiencies. This system integrates federated parameter-efficient fine-tuning (FL PEFT) with low-latency inference across shared edge replicas and model parameters. By recognizing the need to connect fine-tuning and inference—often treated as separate tasks—CoLLM reduces unnecessary deployments and speeds up enhancements in inference quality. As LLMs gain traction in edge intelligence for tailored services and domain-specific uses, the efficiency of post-training stages becomes increasingly vital due to limited resources. Detailed in the arXiv preprint 2604.16400v1, CoLLM introduces an intra-replica model sharing mechanism to tackle challenges at both the replica and cluster levels.
Key facts
- CoLLM is a new framework for co-execution of LLMs
- It unifies federated parameter-efficient fine-tuning (FL PEFT) and inference
- The system operates on shared edge replicas and model parameters
- Addresses the problem of treating fine-tuning and inference as isolated workloads
- Designed for edge intelligence applications with constrained resources
- Includes an intra-replica model sharing mechanism
- Detailed in arXiv preprint 2604.16400v1
- Announced as a cross-type publication
Entities
Institutions
- arXiv