Files in this item
Large-scale automatic k-means clustering for heterogeneous many-core supercomputer
Item metadata
dc.contributor.author | Yu, Teng | |
dc.contributor.author | Zhao, Wenlai | |
dc.contributor.author | Liu, Pan | |
dc.contributor.author | Janjic, Vladimir | |
dc.contributor.author | Yan, Xiaohan | |
dc.contributor.author | Wang, Shicai | |
dc.contributor.author | Fu, Haohuan | |
dc.contributor.author | Yang, Guangwen | |
dc.contributor.author | Thomson, John Donald | |
dc.date.accessioned | 2019-12-09T09:30:07Z | |
dc.date.available | 2019-12-09T09:30:07Z | |
dc.date.issued | 2020-05 | |
dc.identifier | 264077454 | |
dc.identifier | 6d25ad7c-f40e-450a-8b1e-7d23e8089d03 | |
dc.identifier | 85078948317 | |
dc.identifier | 000616431600001 | |
dc.identifier.citation | Yu , T , Zhao , W , Liu , P , Janjic , V , Yan , X , Wang , S , Fu , H , Yang , G & Thomson , J D 2020 , ' Large-scale automatic k-means clustering for heterogeneous many-core supercomputer ' , IEEE Transactions on Parallel and Distributed Systems , vol. 31 , no. 5 , pp. 997-1008 . https://doi.org/10.1109/TPDS.2019.2955467 | en |
dc.identifier.issn | 1045-9219 | |
dc.identifier.uri | https://hdl.handle.net/10023/19096 | |
dc.description | Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1. | en |
dc.description.abstract | This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas. | |
dc.format.extent | 12 | |
dc.format.extent | 2531673 | |
dc.language.iso | eng | |
dc.relation.ispartof | IEEE Transactions on Parallel and Distributed Systems | en |
dc.subject | Supercomputer | en |
dc.subject | Heterogeneous many-core processor | en |
dc.subject | Data partitioning | en |
dc.subject | Clustering | en |
dc.subject | Scheduling | en |
dc.subject | AutoML | en |
dc.subject | QA75 Electronic computers. Computer science | en |
dc.subject | QA76 Computer software | en |
dc.subject | T-NDAS | en |
dc.subject | BDC | en |
dc.subject | R2C | en |
dc.subject | ~DC~ | en |
dc.subject.lcc | QA75 | en |
dc.subject.lcc | QA76 | en |
dc.title | Large-scale automatic k-means clustering for heterogeneous many-core supercomputer | en |
dc.type | Journal article | en |
dc.contributor.sponsor | EPSRC | en |
dc.contributor.sponsor | EPSRC | en |
dc.contributor.institution | University of St Andrews. School of Computer Science | en |
dc.identifier.doi | 10.1109/TPDS.2019.2955467 | |
dc.description.status | Peer reviewed | en |
dc.identifier.grantnumber | EP/R010528/1 | en |
dc.identifier.grantnumber | EP/P020631/1 | en |
This item appears in the following Collection(s)
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.