St Andrews Research Repository

St Andrews University Home
View Item 
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Large-scale automatic k-means clustering for heterogeneous many-core supercomputer

Thumbnail
View/Open
kmeans_TPDS19_final_version.pdf (2.414Mb)
Date
05/2020
Author
Yu, Teng
Zhao, Wenlai
Liu, Pan
Janjic, Vladimir
Yan, Xiaohan
Wang, Shicai
Fu, Haohuan
Yang, Guangwen
Thomson, John Donald
Funder
EPSRC
EPSRC
Grant ID
EP/R010528/1
EP/P020631/1
Keywords
Supercomputer
Heterogeneous many-core processor
Data partitioning
Clustering
Scheduling
AutoML
QA75 Electronic computers. Computer science
QA76 Computer software
T-NDAS
BDC
R2C
~DC~
Metadata
Show full item record
Altmetrics Handle Statistics
Altmetrics DOI Statistics
Abstract
This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.
Citation
Yu , T , Zhao , W , Liu , P , Janjic , V , Yan , X , Wang , S , Fu , H , Yang , G & Thomson , J D 2020 , ' Large-scale automatic k-means clustering for heterogeneous many-core supercomputer ' , IEEE Transactions on Parallel and Distributed Systems , vol. 31 , no. 5 , pp. 997-1008 . https://doi.org/10.1109/TPDS.2019.2955467
Publication
IEEE Transactions on Parallel and Distributed Systems
Status
Peer reviewed
DOI
https://doi.org/10.1109/TPDS.2019.2955467
ISSN
1045-9219
Type
Journal article
Rights
Copyright © 2019 IEEE. This work has been made available online in accordance with publisher policies or with permission. Permission for further reuse of this content should be sought from the publisher or the rights holder. This is the author created accepted manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at https://doi.org/10.1109/TPDS.2019.2955467
Description
Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1.
Collections
  • University of St Andrews Research
URI
http://hdl.handle.net/10023/19096

Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Advanced Search

Browse

All of RepositoryCommunities & CollectionsBy Issue DateNamesTitlesSubjectsClassificationTypeFunderThis CollectionBy Issue DateNamesTitlesSubjectsClassificationTypeFunder

My Account

Login

Open Access

To find out how you can benefit from open access to research, see our library web pages and Open Access blog. For open access help contact: openaccess@st-andrews.ac.uk.

Accessibility

Read our Accessibility statement.

How to submit research papers

The full text of research papers can be submitted to the repository via Pure, the University's research information system. For help see our guide: How to deposit in Pure.

Electronic thesis deposit

Help with deposit.

Repository help

For repository help contact: Digital-Repository@st-andrews.ac.uk.

Give Feedback

Cookie policy

This site may use cookies. Please see Terms and Conditions.

Usage statistics

COUNTER-compliant statistics on downloads from the repository are available from the IRUS-UK Service. Contact us for information.

© University of St Andrews Library

University of St Andrews is a charity registered in Scotland, No SC013532.

  • Facebook
  • Twitter