Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.advisorYe, Juan
dc.contributor.advisorParacchini, Silvia
dc.contributor.authorHequet, Chloe
dc.coverage.spatial213en_US
dc.date.accessioned2024-05-10T11:30:29Z
dc.date.available2024-05-10T11:30:29Z
dc.date.issued2024-06-12
dc.identifier.urihttps://hdl.handle.net/10023/29854
dc.description.abstractThe analysis of genetic point mutations at the population level can offer insights into the genetic basis of human traits, which in turn could potentially lead to new diagnostic and treatment options for heritable diseases. However, existing genetic data analysis methods tend to rely on simplifying assumptions that ignore nonlinear interactions between variants. The ability to model and describe nonlinear genetic interactions could lead to both improved trait prediction and enhanced understanding of the underlying biology. Deep Learning models offer the possibility of automatically learning complex nonlinear genetic architectures, but it is currently unclear how best to optimise them for genetic data. It is also essential that any models be able to “explain” what they have learned in order for them to be used for genetic discovery or clinical applications, which can be difficult due to the black-box nature of DL predictors. This thesis addresses a number of methodological gaps in applying explainable DL models end-to-end on variant-level genetic data. We propose novel methods for encoding genetic data for deep learning applications and show that feature encodings designed specifically for genetic variants offer the possibility of improved model efficiency and performance. We then benchmark a variety of models for the prediction of Body Mass Index using data from the UK Biobank, yielding insights into DL performance in this domain. We then propose a series of novel DL model interpretation methods with features optimised for biological insights. We first show how these can be used to validate that the network has automatically replicated existing knowledge, and then illustrate their ability to detect complex nonlinear genetic interactions that influence BMI in our cohort. Overall, we show that DL model training and interpretation procedures that have been optimised for genetic data can be used to yield new insights into disease aetiology.en_US
dc.language.isoenen_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectDeep learningen_US
dc.subjectGeneticsen_US
dc.subjectArtificial intelligenceen_US
dc.subjectNeural networksen_US
dc.subjectGenotypeen_US
dc.subjectPhenotypeen_US
dc.subjectInterpretable AIen_US
dc.subjectPredictionen_US
dc.subjectBody mass indexen_US
dc.subjectGene gene interactionen_US
dc.subjectFTO geneen_US
dc.titleBiologically-informed interpretable deep learning techniques for BMI prediction and gene interaction detectionen_US
dc.typeThesisen_US
dc.contributor.sponsorUniversity of St Andrews. School of Computer Scienceen_US
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US
dc.publisher.institutionThe University of St Andrewsen_US
dc.rights.embargodate2027-04-08
dc.rights.embargoreasonThesis restricted in accordance with University regulations. Restricted until 8 April 2027en
dc.identifier.doihttps://doi.org/10.17630/sta/901


The following licence files are associated with this item:

    This item appears in the following Collection(s)

    Show simple item record

    Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
    Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International