Show simple item record

Files in this item


Item metadata

dc.contributor.advisorGaggiotti, Oscar E.
dc.contributor.authorQin, Xinghu
dc.coverage.spatialviii, 262 p.en_US
dc.description.abstractDeciphering the evolutionary changes from raw DNA data effectively without the loss of intrinsic information has been the fundamental and core work in population genetics. However, some statistical challenges still restrict the inferential performance in population genetics, for example, the undue emphasis on rare or common alleles measured by different statistics, the ubiquitous multimodal genetic structure within populations, and complex genotype-by-environment associations. In this thesis, I propose to integrate the information-based statistics with machine learning approaches to address these problems and challenges for population genetic inference. First, I evaluated the performance of the information-based summary statistics for spatial demography inference. I showed that the summary statistics based on Shannon differentiation and the transformed diversity of order q=1 had higher power to discriminate spatially-structured scenarios than the traditional allelic richness and heterozygosity-based summary statistics. This provides guidelines for using summary statistics to make inference of spatial demography and for developing new statistical methods to detect signatures of evolutionary changes. Second, I proposed to use Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC) for population genetic structure inference considering the nonlinear and multimodal genetic information between individuals. KLFDAPC outperformed both PCA and DAPC in discriminatory power and in predicting individual geographic origin. KLFDAPC is useful for geographic ancestry inference and correction of population stratification in GWAS. Finally, I proposed a deep learning-based approach (DeepGenomeScan) to detect signals of selection. DeepGenomeScan had higher power than the commonly used machine learning approaches such as pcadapt and RDA in identifying signatures of selection. Furthermore, DeepGenomeScan can be extended to implement various genome-wide association studies (GWAS, TWAS, PWAS, and MWAS) by performing a systematic scanning on genome-wide variants to detect the genetic variations responsible for complex traits or involved in adaptation. In summary, this dissertation addresses several foundational questions in statistics-based and machine learning-based inference, contributing several the-state-of-the-art statistical tools for population genetic inference.en_US
dc.publisherUniversity of St Andrews
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 International*
dc.subjectMachine learningen_US
dc.subjectDeep learningen_US
dc.subjectPopulation geneticsen_US
dc.subjectPopulation structureen_US
dc.subjectDetection of natural selectionen_US
dc.subjectInformation-based summary statisticsen_US
dc.subjectGenome scanen_US
dc.subjectGenome wide association studyen_US
dc.subject.lcshPopulation geneticsen
dc.subject.lcshMachine learningen
dc.titleStatistics, machine learning and deep learning for population genetic inferenceen_US
dc.contributor.sponsorChina Scholarship Council (CSC)en_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US
dc.publisher.institutionThe University of St Andrewsen_US
dc.publisher.departmentCentre for Biological Diversity, School of Biologyen_US
dc.rights.embargoreasonThesis restricted in accordance with University regulations. Print and electronic copy of all chapters and appendices restricted until 2nd June 2022en

The following licence files are associated with this item:

    This item appears in the following Collection(s)

    Show simple item record

    Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
    Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International