Large-scale hierarchical k-means for heterogeneous many-core supercomputers
Date
11/11/2018Author
Grant ID
EP/P020631/1
EP/R010528/1
779882
Keywords
Metadata
Show full item recordAbstract
This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer. Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.
Citation
Li , L , Yu , T , Zhao , W , Fu , H , Wang , C , Tan , L , Yang , G & Thomson , J 2018 , Large-scale hierarchical k-means for heterogeneous many-core supercomputers . in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18) . IEEE Press , Piscataway , The International Conference for High Performance Computing, Networking, Storage, and Analysis , Dallas , Texas , United States , 11/11/18 . https://doi.org/10.5555/3291656.3291674 conference
Publication
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18)
Type
Conference item
Description
Funding: J.Thomson and T.Yu are supported by the EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1, and EU Horizon 2020 grant Team-Play: ”Time, Energy and security Analysis for Multi/Many-core heterogenous PLAtforms” (ICT-779882, https://teamplay- h2020.eu)Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.