Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorKirkpatrick, Ryan
dc.contributor.authorBrown, Christopher Mark
dc.contributor.authorJanjic, Vladimir
dc.date.accessioned2022-11-21T15:30:01Z
dc.date.available2022-11-21T15:30:01Z
dc.date.issued2022-11-15
dc.identifier282241053
dc.identifier4f7f0c6f-d3d1-4655-9887-0a55e75aa7be
dc.identifier85158089521
dc.identifier.citationKirkpatrick , R , Brown , C M & Janjic , V 2022 , COMPROF and COMPLACE : shared-memory communication profiling and automated thread placement via dynamic binary instrumentation . in 29th IEEE International Conference on High Performance Computing, Data, and Analytics . IEEE International Conference on High Performance Computing, Data, and Analytics , 29th IEEE International Conference on High Performance Computing, Data, and Analytics (HIPC) , Bangalore , India , 18/12/22 . https://doi.org/10.1109/HiPC56025.2022.00040en
dc.identifier.citationconferenceen
dc.identifier.issn1094-7256
dc.identifier.urihttps://hdl.handle.net/10023/26458
dc.descriptionFunding: This work was generously supported by UK EPSRC Energise, grant number EP/V006290/1.en
dc.description.abstractThis paper presents COMPROF and COMPLACE, a novel profiling tool and thread placement technique for shared-memory architectures that requires no recompilation or user intervention. We use dynamic binary instrumentation to intercept memory operations and estimate inter-thread communication overhead, deriving (and possibly visualising) a communication graph of data-sharing between threads. We then use this graph to map threads to cores in order to optimise memory traffic through the memory system. Different paths through a system's memory hierarchy have different latency, throughput and energy properties, COMPLACE exploits this heterogeneity to provide automatic performance and energy improvements for multi-threaded programs. We demonstrate COMPLACE on the NAS Parallel Benchmark (NPB) suite where, using our technique, we are able to achieve improvements of up to 12% in the execution time and up to 10% in the energy consumption (compared to default Linux scheduling) while not requiring any modification or recompilation of the application code.
dc.format.extent716437
dc.format.extent716593
dc.language.isoeng
dc.relation.ispartof29th IEEE International Conference on High Performance Computing, Data, and Analyticsen
dc.relation.ispartofseriesIEEE International Conference on High Performance Computing, Data, and Analyticsen
dc.subjectNUMAen
dc.subjectThread Placementen
dc.subjectData Placementen
dc.subjectCache Optimisationen
dc.subjectEnergy Optimizationen
dc.subjectRefactoringen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectNDASen
dc.subjectSDG 7 - Affordable and Clean Energyen
dc.subjectMCPen
dc.subject.lccQA75en
dc.titleCOMPROF and COMPLACE : shared-memory communication profiling and automated thread placement via dynamic binary instrumentationen
dc.typeConference itemen
dc.contributor.sponsorEPSRCen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.identifier.doi10.1109/HiPC56025.2022.00040
dc.identifier.urlhttps://ieeexplore.ieee.org/en
dc.identifier.grantnumberEP/V006290/1en


This item appears in the following Collection(s)

Show simple item record