Statistics, machine learning and deep learning for population genetic inference

Qin, Xinghu

Show simple item record

Files in this item

Name:: XinghuQinPhDThesis_MinusEmbargoedSections.pdf
Size:: 6.264Mb
Format:: PDF
Description:: Thesis text (Excluding chapters 1-5 and appendices)

View/Open

Item metadata

dc.contributor.advisor	Gaggiotti, Oscar E.
dc.contributor.author	Qin, Xinghu
dc.coverage.spatial	viii, 262 p.	en_US
dc.date.accessioned	2021-07-28T15:48:22Z
dc.date.available	2021-07-28T15:48:22Z
dc.date.issued	2021-06-30
dc.identifier.uri	https://hdl.handle.net/10023/23665
dc.description.abstract	Deciphering the evolutionary changes from raw DNA data effectively without the loss of intrinsic information has been the fundamental and core work in population genetics. However, some statistical challenges still restrict the inferential performance in population genetics, for example, the undue emphasis on rare or common alleles measured by different statistics, the ubiquitous multimodal genetic structure within populations, and complex genotype-by-environment associations. In this thesis, I propose to integrate the information-based statistics with machine learning approaches to address these problems and challenges for population genetic inference. First, I evaluated the performance of the information-based summary statistics for spatial demography inference. I showed that the summary statistics based on Shannon differentiation and the transformed diversity of order q=1 had higher power to discriminate spatially-structured scenarios than the traditional allelic richness and heterozygosity-based summary statistics. This provides guidelines for using summary statistics to make inference of spatial demography and for developing new statistical methods to detect signatures of evolutionary changes. Second, I proposed to use Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC) for population genetic structure inference considering the nonlinear and multimodal genetic information between individuals. KLFDAPC outperformed both PCA and DAPC in discriminatory power and in predicting individual geographic origin. KLFDAPC is useful for geographic ancestry inference and correction of population stratification in GWAS. Finally, I proposed a deep learning-based approach (DeepGenomeScan) to detect signals of selection. DeepGenomeScan had higher power than the commonly used machine learning approaches such as pcadapt and RDA in identifying signatures of selection. Furthermore, DeepGenomeScan can be extended to implement various genome-wide association studies (GWAS, TWAS, PWAS, and MWAS) by performing a systematic scanning on genome-wide variants to detect the genetic variations responsible for complex traits or involved in adaptation. In summary, this dissertation addresses several foundational questions in statistics-based and machine learning-based inference, contributing several the-state-of-the-art statistical tools for population genetic inference.	en_US
dc.language.iso	en	en_US
dc.publisher	University of St Andrews
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Machine learning	en_US
dc.subject	Deep learning	en_US
dc.subject	Population genetics	en_US
dc.subject	Population structure	en_US
dc.subject	Detection of natural selection	en_US
dc.subject	Information-based summary statistics	en_US
dc.subject	Genome scan	en_US
dc.subject	Genome wide association study	en_US
dc.subject.lcc	QH455.Q5
dc.subject.lcsh	Population genetics	en
dc.subject.lcsh	Machine learning	en
dc.title	Statistics, machine learning and deep learning for population genetic inference	en_US
dc.type	Thesis	en_US
dc.contributor.sponsor	China Scholarship Council (CSC)	en_US
dc.type.qualificationlevel	Doctoral	en_US
dc.type.qualificationname	PhD Doctor of Philosophy	en_US
dc.publisher.institution	The University of St Andrews	en_US
dc.publisher.department	Centre for Biological Diversity, School of Biology	en_US
dc.rights.embargodate	2022-06-02
dc.rights.embargoreason	Thesis restricted in accordance with University regulations. Print and electronic copy of all chapters and appendices restricted until 2nd June 2022	en
dc.identifier.doi	https://doi.org/10.17630/sta/118

The following licence files are associated with this item:

This item appears in the following Collection(s)

Biology Theses

Show simple item record

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International