Files in this item
Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets
Item metadata
dc.contributor.author | Zhang, Huizi | |
dc.contributor.author | Swallow, Ben | |
dc.contributor.author | Gupta, Mayetri | |
dc.date.accessioned | 2023-01-04T12:30:05Z | |
dc.date.available | 2023-01-04T12:30:05Z | |
dc.date.issued | 2022-08-01 | |
dc.identifier | 281140223 | |
dc.identifier | 58f54f28-dd70-4de4-8d66-af0dafd9fd36 | |
dc.identifier | 000834508100012 | |
dc.identifier.citation | Zhang , H , Swallow , B & Gupta , M 2022 , ' Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets ' , Australian and New Zealand Journal of Statistics , vol. 64 , no. 2 , pp. 313-337 . https://doi.org/10.1111/anzs.12370 | en |
dc.identifier.issn | 1369-1473 | |
dc.identifier.other | ORCID: /0000-0002-0227-2160/work/118411961 | |
dc.identifier.uri | https://hdl.handle.net/10023/26663 | |
dc.description.abstract | Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations-a model-based 'tight' clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components-and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences. | |
dc.format.extent | 25 | |
dc.format.extent | 2555234 | |
dc.language.iso | eng | |
dc.relation.ispartof | Australian and New Zealand Journal of Statistics | en |
dc.subject | Data augmentation | en |
dc.subject | Gibbs sampling | en |
dc.subject | Latent variable models | en |
dc.subject | Markov Chain Monte Carlo | en |
dc.subject | Non-Gaussian clusters | en |
dc.subject | SNP genotyping | en |
dc.subject | QA Mathematics | en |
dc.subject | QH301 Biology | en |
dc.subject | QH426 Genetics | en |
dc.subject | 3rd-DAS | en |
dc.subject | AC | en |
dc.subject | MCC | en |
dc.subject.lcc | QA | en |
dc.subject.lcc | QH301 | en |
dc.subject.lcc | QH426 | en |
dc.title | Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets | en |
dc.type | Journal article | en |
dc.contributor.institution | University of St Andrews. School of Mathematics and Statistics | en |
dc.contributor.institution | University of St Andrews. Centre for Research into Ecological & Environmental Modelling | en |
dc.identifier.doi | 10.1111/anzs.12370 | |
dc.description.status | Peer reviewed | en |
This item appears in the following Collection(s)
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.