Large-scale automatic k-means clustering for heterogeneous many-core supercomputer

Yu, Teng; Zhao, Wenlai; Liu, Pan; Janjic, Vladimir; Yan, Xiaohan; Wang, Shicai; Fu, Haohuan; Yang, Guangwen; Thomson, John Donald

Show simple item record

Files in this item

Name:: kmeans_TPDS19_final_version.pdf
Size:: 2.414Mb
Format:: PDF

View/Open

Item metadata

dc.contributor.author	Yu, Teng
dc.contributor.author	Zhao, Wenlai
dc.contributor.author	Liu, Pan
dc.contributor.author	Janjic, Vladimir
dc.contributor.author	Yan, Xiaohan
dc.contributor.author	Wang, Shicai
dc.contributor.author	Fu, Haohuan
dc.contributor.author	Yang, Guangwen
dc.contributor.author	Thomson, John Donald
dc.date.accessioned	2019-12-09T09:30:07Z
dc.date.available	2019-12-09T09:30:07Z
dc.date.issued	2020-05
dc.identifier	264077454
dc.identifier	6d25ad7c-f40e-450a-8b1e-7d23e8089d03
dc.identifier	85078948317
dc.identifier	000616431600001
dc.identifier.citation	Yu , T , Zhao , W , Liu , P , Janjic , V , Yan , X , Wang , S , Fu , H , Yang , G & Thomson , J D 2020 , ' Large-scale automatic k-means clustering for heterogeneous many-core supercomputer ' , IEEE Transactions on Parallel and Distributed Systems , vol. 31 , no. 5 , pp. 997-1008 . https://doi.org/10.1109/TPDS.2019.2955467	en
dc.identifier.issn	1045-9219
dc.identifier.uri	https://hdl.handle.net/10023/19096
dc.description	Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1.	en
dc.description.abstract	This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We ﬁrst introduce a multilevel parallel partition approach that not only partitions by dataﬂow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufﬁcient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.
dc.format.extent	12
dc.format.extent	2531673
dc.language.iso	eng
dc.relation.ispartof	IEEE Transactions on Parallel and Distributed Systems	en
dc.subject	Supercomputer	en
dc.subject	Heterogeneous many-core processor	en
dc.subject	Data partitioning	en
dc.subject	Clustering	en
dc.subject	Scheduling	en
dc.subject	AutoML	en
dc.subject	QA75 Electronic computers. Computer science	en
dc.subject	QA76 Computer software	en
dc.subject	T-NDAS	en
dc.subject	BDC	en
dc.subject	R2C	en
dc.subject	~DC~	en
dc.subject.lcc	QA75	en
dc.subject.lcc	QA76	en
dc.title	Large-scale automatic k-means clustering for heterogeneous many-core supercomputer	en
dc.type	Journal article	en
dc.contributor.sponsor	EPSRC	en
dc.contributor.sponsor	EPSRC	en
dc.contributor.institution	University of St Andrews. School of Computer Science	en
dc.identifier.doi	https://doi.org/10.1109/TPDS.2019.2955467
dc.description.status	Peer reviewed	en
dc.identifier.grantnumber	EP/R010528/1	en
dc.identifier.grantnumber	EP/P020631/1	en

This item appears in the following Collection(s)

University of St Andrews Research

Show simple item record