Files in this item
Large-scale automatic k-means clustering for heterogeneous many-core supercomputer
Item metadata
dc.contributor.author | Yu, Teng | |
dc.contributor.author | Zhao, Wenlai | |
dc.contributor.author | Liu, Pan | |
dc.contributor.author | Janjic, Vladimir | |
dc.contributor.author | Yan, Xiaohan | |
dc.contributor.author | Wang, Shicai | |
dc.contributor.author | Fu, Haohuan | |
dc.contributor.author | Yang, Guangwen | |
dc.contributor.author | Thomson, John Donald | |
dc.date.accessioned | 2019-12-09T09:30:07Z | |
dc.date.available | 2019-12-09T09:30:07Z | |
dc.date.issued | 2020-05 | |
dc.identifier.citation | Yu , T , Zhao , W , Liu , P , Janjic , V , Yan , X , Wang , S , Fu , H , Yang , G & Thomson , J D 2020 , ' Large-scale automatic k-means clustering for heterogeneous many-core supercomputer ' , IEEE Transactions on Parallel and Distributed Systems , vol. 31 , no. 5 , pp. 997-1008 . https://doi.org/10.1109/TPDS.2019.2955467 | en |
dc.identifier.issn | 1045-9219 | |
dc.identifier.other | PURE: 264077454 | |
dc.identifier.other | PURE UUID: 6d25ad7c-f40e-450a-8b1e-7d23e8089d03 | |
dc.identifier.other | Scopus: 85078948317 | |
dc.identifier.other | WOS: 000616431600001 | |
dc.identifier.uri | http://hdl.handle.net/10023/19096 | |
dc.description | Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1. | en |
dc.description.abstract | This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas. | |
dc.format.extent | 12 | |
dc.language.iso | eng | |
dc.relation.ispartof | IEEE Transactions on Parallel and Distributed Systems | en |
dc.rights | Copyright © 2019 IEEE. This work has been made available online in accordance with publisher policies or with permission. Permission for further reuse of this content should be sought from the publisher or the rights holder. This is the author created accepted manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at https://doi.org/10.1109/TPDS.2019.2955467 | en |
dc.subject | Supercomputer | en |
dc.subject | Heterogeneous many-core processor | en |
dc.subject | Data partitioning | en |
dc.subject | Clustering | en |
dc.subject | Scheduling | en |
dc.subject | AutoML | en |
dc.subject | QA75 Electronic computers. Computer science | en |
dc.subject | QA76 Computer software | en |
dc.subject | T-NDAS | en |
dc.subject | BDC | en |
dc.subject | R2C | en |
dc.subject | ~DC~ | en |
dc.subject.lcc | QA75 | en |
dc.subject.lcc | QA76 | en |
dc.title | Large-scale automatic k-means clustering for heterogeneous many-core supercomputer | en |
dc.type | Journal article | en |
dc.contributor.sponsor | EPSRC | en |
dc.contributor.sponsor | EPSRC | en |
dc.description.version | Postprint | en |
dc.contributor.institution | University of St Andrews. School of Computer Science | en |
dc.identifier.doi | https://doi.org/10.1109/TPDS.2019.2955467 | |
dc.description.status | Peer reviewed | en |
dc.identifier.grantnumber | EP/R010528/1 | en |
dc.identifier.grantnumber | EP/P020631/1 | en |
This item appears in the following Collection(s)
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.