Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorQin, Xinghu
dc.contributor.authorChiang, Charleston W K
dc.contributor.authorGaggiotti, Oscar E
dc.date.accessioned2022-06-17T07:30:09Z
dc.date.available2022-06-17T07:30:09Z
dc.date.issued2022-07-01
dc.identifier280015769
dc.identifier37f47524-f34c-48d1-8d65-b6931aeced17
dc.identifier35649387
dc.identifier85134632664
dc.identifier000804257200001
dc.identifier.citationQin , X , Chiang , C W K & Gaggiotti , O E 2022 , ' KLFDAPC : a supervised machine learning approach for spatial genetic structure analysis ' , Briefings in Bioinformatics , vol. 23 , no. 4 , bbac202 . https://doi.org/10.1093/bib/bbac202en
dc.identifier.issn1467-5463
dc.identifier.otherORCID: /0000-0003-1827-1493/work/114336023
dc.identifier.urihttps://hdl.handle.net/10023/25543
dc.descriptionFunding: CSC-University of St Andrews Joint Scholarship (to X.Q.); International Postdoctoral Exchange Fellowship Program (Talent-Introduction Program) from China Postdoc Council (to X.Q.); National Institute of General Medical Sciences (NIGMS) of the National Institute of Health (grant R35GM142783 to C.W.K.C.). Part of the computation for this work is supported by USC’s Center for Advanced Research Computing (https://carc.usc.edu).en
dc.description.abstractGeographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
dc.format.extent16
dc.format.extent2058603
dc.language.isoeng
dc.relation.ispartofBriefings in Bioinformaticsen
dc.subjectMachine learningen
dc.subjectPopulation structureen
dc.subjectIndividual geographic originen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectDASen
dc.subjectSDG 3 - Good Health and Well-beingen
dc.subjectMCCen
dc.subject.lccQA75en
dc.titleKLFDAPC : a supervised machine learning approach for spatial genetic structure analysisen
dc.typeJournal articleen
dc.contributor.institutionUniversity of St Andrews. School of Biologyen
dc.contributor.institutionUniversity of St Andrews. Centre for Biological Diversityen
dc.contributor.institutionUniversity of St Andrews. Scottish Oceans Instituteen
dc.contributor.institutionUniversity of St Andrews. St Andrews Bioinformatics Uniten
dc.contributor.institutionUniversity of St Andrews. Marine Alliance for Science & Technology Scotlanden
dc.identifier.doi10.1093/bib/bbac202
dc.description.statusPeer revieweden


This item appears in the following Collection(s)

Show simple item record