Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorSaikumar, Dhananjay
dc.contributor.authorVarghese, Blesson
dc.date.accessioned2024-05-03T14:30:09Z
dc.date.available2024-05-03T14:30:09Z
dc.date.issued2024-04
dc.identifier301891691
dc.identifier5d7fa2fd-57f4-43ed-9072-549f51acb6be
dc.identifier85191987987
dc.identifier.citationSaikumar , D & Varghese , B 2024 , NeuroFlux: memory-efficient CNN training using adaptive local learning . in EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems . ACM , pp. 999-1015 , The European Conference on Computer Systems , Athens , Greece , 22/04/24 . https://doi.org/10.1145/3627703.3650067en
dc.identifier.citationconferenceen
dc.identifier.isbn9798400704376
dc.identifier.urihttps://hdl.handle.net/10023/29805
dc.description.abstractEfficient on-device Convolutional Neural Network (CNN) training in resource-constrained mobile and edge environments is an open challenge. Backpropagation is the standard approach adopted, but it is GPU memory intensive due to its strong inter-layer dependencies that demand intermediate activations across the entire CNN model to be retained in GPU memory. This necessitates smaller batch sizes to make training possible within the available GPU memory budget, but in turn, results in substantially high and impractical training time. We introduce NeuroFlux, a novel CNN training system tailored for memory-constrained scenarios. We develop two novel opportunities: firstly, adaptive auxiliary networks that employ a variable number of filters to reduce GPU memory usage, and secondly, block-specific adaptive batch sizes, which not only cater to the GPU memory constraints but also accelerate the training process. NeuroFlux segments a CNN into blocks based on GPU memory usage and further attaches an auxiliary network to each layer in these blocks. This disrupts the typical layer dependencies under a new training paradigm - 'adaptive local learning'. Moreover, NeuroFlux adeptly caches intermediate activations, eliminating redundant forward passes over previously trained blocks, further accelerating the training process. The results are twofold when compared to Backpropagation: on various hardware platforms, NeuroFlux demonstrates training speed-ups of 2.3× to 6.1× under stringent GPU memory budgets, and NeuroFlux generates streamlined models that have 10.9× to 29.4× fewer parameters
dc.format.extent17
dc.format.extent833114
dc.language.isoeng
dc.publisherACM
dc.relation.ispartofEuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systemsen
dc.subjectCNN trainingen
dc.subjectMemory efficient trainingen
dc.subjectLocal learningen
dc.subjectEdge computingen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subject3rd-NDASen
dc.subject.lccQA75en
dc.titleNeuroFlux: memory-efficient CNN training using adaptive local learningen
dc.typeConference itemen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.identifier.doi10.1145/3627703.3650067


This item appears in the following Collection(s)

Show simple item record