Some problems in the theory and application of the methods of numerical taxonomy
Abstract
Several of the methods of numerical taxonomy are compared
and shown to be variants of a tripartite grouping procedure
associated with a generalised intercluster similarity function
involving ten computational parameters. Clustering by the techniques of hierarchic fusion, monothetic division and iterative
relocation is obtained using different arithmetic combinations
of the function parameters to both compute similarities and effect
changes in cluster membership. The combinatorial solution for
Ward's method is found, and the centroid sorting combinatorial
solution is extended for size difference, shape difference, dispersion and dot product coefficients.
It is suggested that clusters are characterised more by the
choice of similarity criterion than by the choice of method, and
it is demonstrated that some common criteria such as distance and
the error sum of squares are inclined to force spherical 'minimum-variance' classes. These are contrasted by 'natural' classes,
which correspond to closed density surfaces defined for a multi-variate sample space by the underlying probability density function.
A method for mode-seeking is developed from this probabilistic
model through various theoretical and experimental phases, and it
is shown to perform slightly better than iterative relocation with
the minimum-variance criteria using several Gaussian test populations.
A fast algorithm is proposed for the solution of the
Jardine-Sibson method for generating overlapping classes, and it
is observed that this technique finds natural classes and is
closely related to the probabilistic model.
Some aspects of computational procedures are discussed, and
in particular, it is proposed that a generalised system involving
a statistical language, conversational mode package and program suite could be developed from a basic subroutine system. Paging and simulation techniques for the organisation of direct-access
data files are suggested, and a comprehensive package of computer
programs for cluster analysis is described.
Type
Thesis, PhD Doctor of Philosophy
Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.