PFClust: an optimised implementation of a parameter-free clustering algorithm
Abstract
Background: A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data. Results: The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast. Conclusions: In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.
Citation
Musayeva , K , Henderson , T , Mitchell , J B O & Mavridis , L 2014 , ' PFClust: an optimised implementation of a parameter-free clustering algorithm ' , Source Code for Biology and Medicine , vol. 9 , no. 5 . https://doi.org/10.1186/1751-0473-9-5
Publication
Source Code for Biology and Medicine
Status
Peer reviewed
ISSN
1751-0473Type
Journal article
Rights
© 2014 Musayeva et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Description
This work was supported by the World Anti-Doping Agency and the Scottish Universities Life Sciences Alliance.Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.