Machine learning in systems biology at different scales : from molecular biology to ecology
Abstract
Machine learning has been a source for continuous methodological advances in the field of computational learning from data. Systems biology has profited in various ways
from machine learning techniques but in particular from network inference, i.e. the
learning of interactions given observed quantities of the involved components or data
that stem from interventional experiments. Originally this domain of system biology
was confined to the inference of gene regulation networks but recently expanded to other
levels of organization of biological and ecological systems. Especially the application to
species interaction networks in a varying environment is of mounting importance in
order to improve our understanding of the dynamics of species extinctions, invasions,
and population behaviour in general.
The aim of this thesis is to demonstrate an extensive study of various state-of-art
machine learning techniques applied to a genetic regulation system in plants and to
expand and modify some of these methods to infer species interaction networks in an
ecological setting. The first study attempts to improve the knowledge about circadian
regulation in the plant Arabidopsis thaliana from the view point of machine learning and
gives suggestions on what methods are best suited for inference, how the data should
be processed and modelled mathematically, and what quality of network learning can
be expected by doing so. To achieve this, I generate a rich and realistic synthetic data
set that is used for various studies under consideration of different effects and method
setups. The best method and setup is applied to real transcriptional data, which leads
to a new hypothesis about the circadian clock network structure.
The ecological study is focused on the development of two novel inference methods
that exploit a common principle from transcriptional time-series, which states that expression
profiles over time can be temporally heterogeneous. A corresponding concept
in a spatial domain of 2 dimensions is that species interaction dynamics can be spatially
heterogeneous, i.e. can change in space dependent on the environment and other
factors. I will demonstrate the expansion from the 1-dimensional time domain to the
2-dimensional spatial domain, introduce two distinct space segmentation schemes, and
consider species dispersion effects with spatial autocorrelation. The two novel methods
display a significant improvement in species interaction inference compared to competing
methods and display a high confidence in learning the spatial structure of different
species neighbourhoods or environments.
Type
Thesis, PhD Doctor of Philosophy
Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.