Biologically-informed interpretable deep learning techniques for BMI prediction and gene interaction detection

Hequet, Chloe

Show simple item record

Files in this item

Name:: Thesis-Chloe-Hequet-complete-version.pdf
Size:: 6.750Mb
Format:: PDF
Description:: Complete version

View/Open

Name:: Thesis-Chloe-Hequet-complete-version.zip
Size:: 18.64Mb
Format:: application/zip
Description:: Complete version (Preservation copy)

View/Open

Item metadata

dc.contributor.advisor	Ye, Juan
dc.contributor.advisor	Paracchini, Silvia
dc.contributor.author	Hequet, Chloe
dc.coverage.spatial	213	en_US
dc.date.accessioned	2024-05-10T11:30:29Z
dc.date.available	2024-05-10T11:30:29Z
dc.date.issued	2024-06-12
dc.identifier.uri	https://hdl.handle.net/10023/29854
dc.description.abstract	The analysis of genetic point mutations at the population level can offer insights into the genetic basis of human traits, which in turn could potentially lead to new diagnostic and treatment options for heritable diseases. However, existing genetic data analysis methods tend to rely on simplifying assumptions that ignore nonlinear interactions between variants. The ability to model and describe nonlinear genetic interactions could lead to both improved trait prediction and enhanced understanding of the underlying biology. Deep Learning models offer the possibility of automatically learning complex nonlinear genetic architectures, but it is currently unclear how best to optimise them for genetic data. It is also essential that any models be able to “explain” what they have learned in order for them to be used for genetic discovery or clinical applications, which can be difficult due to the black-box nature of DL predictors. This thesis addresses a number of methodological gaps in applying explainable DL models end-to-end on variant-level genetic data. We propose novel methods for encoding genetic data for deep learning applications and show that feature encodings designed specifically for genetic variants offer the possibility of improved model efficiency and performance. We then benchmark a variety of models for the prediction of Body Mass Index using data from the UK Biobank, yielding insights into DL performance in this domain. We then propose a series of novel DL model interpretation methods with features optimised for biological insights. We first show how these can be used to validate that the network has automatically replicated existing knowledge, and then illustrate their ability to detect complex nonlinear genetic interactions that influence BMI in our cohort. Overall, we show that DL model training and interpretation procedures that have been optimised for genetic data can be used to yield new insights into disease aetiology.	en_US
dc.language.iso	en	en_US
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Deep learning	en_US
dc.subject	Genetics	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Neural networks	en_US
dc.subject	Genotype	en_US
dc.subject	Phenotype	en_US
dc.subject	Interpretable AI	en_US
dc.subject	Prediction	en_US
dc.subject	Body mass index	en_US
dc.subject	Gene gene interaction	en_US
dc.subject	FTO gene	en_US
dc.title	Biologically-informed interpretable deep learning techniques for BMI prediction and gene interaction detection	en_US
dc.type	Thesis	en_US
dc.contributor.sponsor	University of St Andrews. School of Computer Science	en_US
dc.type.qualificationlevel	Doctoral	en_US
dc.type.qualificationname	PhD Doctor of Philosophy	en_US
dc.publisher.institution	The University of St Andrews	en_US
dc.rights.embargodate	2027-04-08
dc.rights.embargoreason	Thesis restricted in accordance with University regulations. Restricted until 8 April 2027	en
dc.identifier.doi	https://doi.org/10.17630/sta/901

The following licence files are associated with this item:

This item appears in the following Collection(s)

Computer Science Theses

Show simple item record

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International