Computational analysis of tissue images in cancer diagnosis and prognosis : machine learning-based methods for the next generation of computational pathology
MetadataShow full item record
The focus of this work is to develop machine learning systems capable of tissue image analysis in the context of cancer diagnosis and prognosis. Such a system can not only identify new prognostic markers, but can also serve as a standalone clinical prediction rule, the premise being that its non-linear, multivariate nature may be capable of identifying and employing complex patterns that collectively provide accurate cancer diagnosis and prognosis, better than the clinical gold standard. The task, however, is very challenging because of the extremely high resolution of the images, highly heterogeneous microenvironment, multiple sources of noise and artifacts, and low-granularity of ground truth. A starting point of related work which tackles the same task is the extraction of handcrafted features. I investigate the application of machine learning for prognosis using handcrafted features, and develop prognostic machine learning models that demonstrate better performances than baselines based on clinically employed prognostic systems, in two separate cohorts of colorectal and muscle-invasive bladder cancer patients. Moreover, analysis of the proposed methods provides insight behind the prognostic nature of characteristics within the microenvironment, not yet included in the clinical systems. The emergence of deep learning has enabled analysis with images directly. Given the laborious, expensive, and human bias inducing nature of designing and building pipelines for handcrafted feature extraction, I investigate the application of deep learning on tissue images directly. In particular, I propose a framework that allows the training of models directly from exhaustively-tiled whole slide images with only patient-level ground truth, and demonstrate its effectiveness on colorectal cancer prognosis. In my final work, I introduce a new type of CNN-based method, called Magnifying Networks, for gigapixel image analysis that does not require whole slide images to be patch-based preprocessed. Instead, MagNets dynamically extract patches from the tissue image based on the best magnification level, field-of-view, and location according to an optimizing task, and not based on generic, predefined or static ways. My results on the publicly available Camelyon16 and Camelyon17 datasets demonstrate the effectiveness of MagNets, as well as the proposed optimization framework, on the task of whole slide image classification. MagNets process far fewer patches from each slide than any of the existing end-to-end approaches (10 to 300 times fewer).
Thesis, PhD Doctor of Philosophy
Creative Commons Attribution-NonCommercial 4.0 Internationalhttp://creativecommons.org/licenses/by-nc/4.0/
Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial 4.0 International
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.