Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorYu, Teng
dc.contributor.authorZhao, Wenlai
dc.contributor.authorLiu, Pan
dc.contributor.authorJanjic, Vladimir
dc.contributor.authorYan, Xiaohan
dc.contributor.authorWang, Shicai
dc.contributor.authorFu, Haohuan
dc.contributor.authorYang, Guangwen
dc.contributor.authorThomson, John Donald
dc.date.accessioned2019-12-09T09:30:07Z
dc.date.available2019-12-09T09:30:07Z
dc.date.issued2020-05
dc.identifier264077454
dc.identifier6d25ad7c-f40e-450a-8b1e-7d23e8089d03
dc.identifier85078948317
dc.identifier000616431600001
dc.identifier.citationYu , T , Zhao , W , Liu , P , Janjic , V , Yan , X , Wang , S , Fu , H , Yang , G & Thomson , J D 2020 , ' Large-scale automatic k-means clustering for heterogeneous many-core supercomputer ' , IEEE Transactions on Parallel and Distributed Systems , vol. 31 , no. 5 , pp. 997-1008 . https://doi.org/10.1109/TPDS.2019.2955467en
dc.identifier.issn1045-9219
dc.identifier.urihttps://hdl.handle.net/10023/19096
dc.descriptionFunding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1.en
dc.description.abstractThis article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.
dc.format.extent12
dc.format.extent2531673
dc.language.isoeng
dc.relation.ispartofIEEE Transactions on Parallel and Distributed Systemsen
dc.subjectSupercomputeren
dc.subjectHeterogeneous many-core processoren
dc.subjectData partitioningen
dc.subjectClusteringen
dc.subjectSchedulingen
dc.subjectAutoMLen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectQA76 Computer softwareen
dc.subjectT-NDASen
dc.subjectBDCen
dc.subjectR2Cen
dc.subject~DC~en
dc.subject.lccQA75en
dc.subject.lccQA76en
dc.titleLarge-scale automatic k-means clustering for heterogeneous many-core supercomputeren
dc.typeJournal articleen
dc.contributor.sponsorEPSRCen
dc.contributor.sponsorEPSRCen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.identifier.doi10.1109/TPDS.2019.2955467
dc.description.statusPeer revieweden
dc.identifier.grantnumberEP/R010528/1en
dc.identifier.grantnumberEP/P020631/1en


This item appears in the following Collection(s)

Show simple item record