Files in this item
Biologically-informed interpretable deep learning techniques for BMI prediction and gene interaction detection
Item metadata
dc.contributor.advisor | Ye, Juan | |
dc.contributor.advisor | Paracchini, Silvia | |
dc.contributor.author | Hequet, Chloe | |
dc.coverage.spatial | 213 | en_US |
dc.date.accessioned | 2024-05-10T11:30:29Z | |
dc.date.available | 2024-05-10T11:30:29Z | |
dc.date.issued | 2024-06-12 | |
dc.identifier.uri | https://hdl.handle.net/10023/29854 | |
dc.description.abstract | The analysis of genetic point mutations at the population level can offer insights into the genetic basis of human traits, which in turn could potentially lead to new diagnostic and treatment options for heritable diseases. However, existing genetic data analysis methods tend to rely on simplifying assumptions that ignore nonlinear interactions between variants. The ability to model and describe nonlinear genetic interactions could lead to both improved trait prediction and enhanced understanding of the underlying biology. Deep Learning models offer the possibility of automatically learning complex nonlinear genetic architectures, but it is currently unclear how best to optimise them for genetic data. It is also essential that any models be able to “explain” what they have learned in order for them to be used for genetic discovery or clinical applications, which can be difficult due to the black-box nature of DL predictors. This thesis addresses a number of methodological gaps in applying explainable DL models end-to-end on variant-level genetic data. We propose novel methods for encoding genetic data for deep learning applications and show that feature encodings designed specifically for genetic variants offer the possibility of improved model efficiency and performance. We then benchmark a variety of models for the prediction of Body Mass Index using data from the UK Biobank, yielding insights into DL performance in this domain. We then propose a series of novel DL model interpretation methods with features optimised for biological insights. We first show how these can be used to validate that the network has automatically replicated existing knowledge, and then illustrate their ability to detect complex nonlinear genetic interactions that influence BMI in our cohort. Overall, we show that DL model training and interpretation procedures that have been optimised for genetic data can be used to yield new insights into disease aetiology. | en_US |
dc.language.iso | en | en_US |
dc.rights | Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Deep learning | en_US |
dc.subject | Genetics | en_US |
dc.subject | Artificial intelligence | en_US |
dc.subject | Neural networks | en_US |
dc.subject | Genotype | en_US |
dc.subject | Phenotype | en_US |
dc.subject | Interpretable AI | en_US |
dc.subject | Prediction | en_US |
dc.subject | Body mass index | en_US |
dc.subject | Gene gene interaction | en_US |
dc.subject | FTO gene | en_US |
dc.title | Biologically-informed interpretable deep learning techniques for BMI prediction and gene interaction detection | en_US |
dc.type | Thesis | en_US |
dc.contributor.sponsor | University of St Andrews. School of Computer Science | en_US |
dc.type.qualificationlevel | Doctoral | en_US |
dc.type.qualificationname | PhD Doctor of Philosophy | en_US |
dc.publisher.institution | The University of St Andrews | en_US |
dc.rights.embargodate | 2027-04-08 | |
dc.rights.embargoreason | Thesis restricted in accordance with University regulations. Restricted until 8 April 2027 | en |
dc.identifier.doi | https://doi.org/10.17630/sta/901 |
The following licence files are associated with this item:
This item appears in the following Collection(s)
Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.