Distributed machine learning on Edge computing systems

Wu, Di

View/Open

Thesis-Di-Wu-complete-version.pdf (17.06Mb)

Thesis-Di-Wu-complete-version-LaTeX-files.zip (18.27Mb)

Date

03/12/2024

Abstract

The demand for distributed machine learning (DML) systems, which distribute training workloads across multiple nodes, has surged over the past decade due to the rapid growth of datasets and computational requirements. Additionally, executing ML training at the edge has gained importance for data privacy and reducing communication costs associated with sending raw data to the cloud. These trends have motivated a new ML training paradigm: DML at the edge. However, implementing DML systems at the edge presents four key challenges: (i) Hardware limited and heterogeneous resources at the edge result in impractical training times; (ii) The communication costs of DML systems at the edge are substantial; (iii) On-device training cannot be carried out on low-end devices; (iv) There is a lack of a comprehensive framework that can tackle the aforementioned challenges and support efficient DML systems at the edge. This thesis presents four techniques to address the above. First, it proposes an adaptive deep neural network (DNN) partitioning and offloading technique to address limited device resources with cloud assistance. This DNN partitioning-based federated learning (DPFL) system is further optimized by a reinforcement learning agent to adapt to heterogeneous devices. The thesis then introduces the techniques of pre-training initialization and replay buffer to reduce gradient and activation communication, identified as bottlenecks in a DPFL system. Additionally, a dual-phase layer freezing technique is proposed to minimize the on- device computations. Finally, a holistic framework is developed to integrate these techniques, maximizing their application and impact. The proposed framework supports the building of a new DPFL system that is more efficient than classic DML at the edge. Experimental evaluation on two real-world testbeds across various datasets and model architectures demonstrates the improvements of the proposed DML system on a range of quality and performance metrics, such as final accuracy, training latency, and communication cost.

DOI

https://doi.org/10.17630/sta/1164

Type

Thesis, PhD Doctor of Philosophy

Rights

Creative Commons Attribution-NonCommercial 4.0 International

http://creativecommons.org/licenses/by-nc/4.0/

Collections

Computer Science Theses

URI

https://hdl.handle.net/10023/30919

Creative Commons Attribution-NonCommercial 4.0 International

Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial 4.0 International