Some problems in the theory and application of the methods of numerical taxonomy
MetadataShow full item record
Altmetrics Handle Statistics
Several of the methods of numerical taxonomy are compared and shown to be variants of a tripartite grouping procedure associated with a generalised intercluster similarity function involving ten computational parameters. Clustering by the techniques of hierarchic fusion, monothetic division and iterative relocation is obtained using different arithmetic combinations of the function parameters to both compute similarities and effect changes in cluster membership. The combinatorial solution for Ward's method is found, and the centroid sorting combinatorial solution is extended for size difference, shape difference, dispersion and dot product coefficients. It is suggested that clusters are characterised more by the choice of similarity criterion than by the choice of method, and it is demonstrated that some common criteria such as distance and the error sum of squares are inclined to force spherical 'minimum-variance' classes. These are contrasted by 'natural' classes, which correspond to closed density surfaces defined for a multi-variate sample space by the underlying probability density function. A method for mode-seeking is developed from this probabilistic model through various theoretical and experimental phases, and it is shown to perform slightly better than iterative relocation with the minimum-variance criteria using several Gaussian test populations. A fast algorithm is proposed for the solution of the Jardine-Sibson method for generating overlapping classes, and it is observed that this technique finds natural classes and is closely related to the probabilistic model. Some aspects of computational procedures are discussed, and in particular, it is proposed that a generalised system involving a statistical language, conversational mode package and program suite could be developed from a basic subroutine system. Paging and simulation techniques for the organisation of direct-access data files are suggested, and a comprehensive package of computer programs for cluster analysis is described.
Thesis, PhD Doctor of Philosophy
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.