Files in this item
NeuroFlux: memory-efficient CNN training using adaptive local learning
Item metadata
dc.contributor.author | Saikumar, Dhananjay | |
dc.contributor.author | Varghese, Blesson | |
dc.date.accessioned | 2024-05-03T14:30:09Z | |
dc.date.available | 2024-05-03T14:30:09Z | |
dc.date.issued | 2024-04 | |
dc.identifier | 301891691 | |
dc.identifier | 5d7fa2fd-57f4-43ed-9072-549f51acb6be | |
dc.identifier | 85191987987 | |
dc.identifier.citation | Saikumar , D & Varghese , B 2024 , NeuroFlux: memory-efficient CNN training using adaptive local learning . in EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems . ACM , pp. 999-1015 , The European Conference on Computer Systems , Athens , Greece , 22/04/24 . https://doi.org/10.1145/3627703.3650067 | en |
dc.identifier.citation | conference | en |
dc.identifier.isbn | 9798400704376 | |
dc.identifier.uri | https://hdl.handle.net/10023/29805 | |
dc.description.abstract | Efficient on-device Convolutional Neural Network (CNN) training in resource-constrained mobile and edge environments is an open challenge. Backpropagation is the standard approach adopted, but it is GPU memory intensive due to its strong inter-layer dependencies that demand intermediate activations across the entire CNN model to be retained in GPU memory. This necessitates smaller batch sizes to make training possible within the available GPU memory budget, but in turn, results in substantially high and impractical training time. We introduce NeuroFlux, a novel CNN training system tailored for memory-constrained scenarios. We develop two novel opportunities: firstly, adaptive auxiliary networks that employ a variable number of filters to reduce GPU memory usage, and secondly, block-specific adaptive batch sizes, which not only cater to the GPU memory constraints but also accelerate the training process. NeuroFlux segments a CNN into blocks based on GPU memory usage and further attaches an auxiliary network to each layer in these blocks. This disrupts the typical layer dependencies under a new training paradigm - 'adaptive local learning'. Moreover, NeuroFlux adeptly caches intermediate activations, eliminating redundant forward passes over previously trained blocks, further accelerating the training process. The results are twofold when compared to Backpropagation: on various hardware platforms, NeuroFlux demonstrates training speed-ups of 2.3× to 6.1× under stringent GPU memory budgets, and NeuroFlux generates streamlined models that have 10.9× to 29.4× fewer parameters | |
dc.format.extent | 17 | |
dc.format.extent | 833114 | |
dc.language.iso | eng | |
dc.publisher | ACM | |
dc.relation.ispartof | EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems | en |
dc.subject | CNN training | en |
dc.subject | Memory efficient training | en |
dc.subject | Local learning | en |
dc.subject | Edge computing | en |
dc.subject | QA75 Electronic computers. Computer science | en |
dc.subject | 3rd-NDAS | en |
dc.subject.lcc | QA75 | en |
dc.title | NeuroFlux: memory-efficient CNN training using adaptive local learning | en |
dc.type | Conference item | en |
dc.contributor.institution | University of St Andrews. School of Computer Science | en |
dc.identifier.doi | 10.1145/3627703.3650067 |
This item appears in the following Collection(s)
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.