EcoFed : efficient communication for DNN partitioning-based federated learning
Abstract
Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates significant communication overheads since the intermediate activation and gradient need to be transferred between the device and the server during training. While current research reduces the communication introduced by DNN partitioning using local loss-based methods, we demonstrate that these methods are ineffective in improving the overall efficiency (communication overhead and training speed) of a DPFL system. This is because they suffer from accuracy degradation and ignore the communication costs incurred when transferring the activation from the device to the server. This article proposes Eco Fed-a communication efficient framework for DPFL systems. Eco Fed-a eliminates the transmission of the gradient by developing pre-trained initialization of the DNN model on the device for the first time. This reduces the accuracy degradation seen in local loss-based methods. In addition, EcoFed proposes a novel replay buffer mechanism and implements a quantization-based compression technique to reduce the transmission of the activation. It is experimentally demonstrated that EcoFed can reduce the communication cost by up to 133× and accelerate training by up to 21× when compared to classic FL. Compared to vanilla DPFL, EcoFed achieves a 16× communication reduction and 2.86× training time speed-up. EcoFed is available from https://github.com/blessonvar/EcoFed .
Citation
Wu , D , Ullah , R , Rodgers , P , Kilpatrick , P , Spence , I & Varghese , B 2024 , ' EcoFed : efficient communication for DNN partitioning-based federated learning ' , IEEE Transactions on Parallel and Distributed Systems , vol. Early Access , 10380682 . https://doi.org/10.1109/TPDS.2024.3349617
Publication
IEEE Transactions on Parallel and Distributed Systems
Status
Peer reviewed
ISSN
1045-9219Type
Journal article
Description
Funding: This work was sponsored by Rakuten Mobile, Japan.Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.