Computational analysis of tissue images in cancer diagnosis and prognosis : machine learning-based methods for the next generation of computational pathology

Dimitriou, Neofytos

View/Open

Thesis-Neofytos-Dimitriou-complete-version.pdf (49.97Mb)

Thesis_Neofytos_Dimitriou_120018523_final_latex_no_figs.zip (758.2Kb)

Figures_1.zip (14.52Mb)

Figures_2.zip (40.28Mb)

Date

14/06/2023

Grant ID

1950036

Metadata

Show full item record

Abstract

The focus of this work is to develop machine learning systems capable of tissue image analysis in the context of cancer diagnosis and prognosis. Such a system can not only identify new prognostic markers, but can also serve as a standalone clinical prediction rule, the premise being that its non-linear, multivariate nature may be capable of identifying and employing complex patterns that collectively provide accurate cancer diagnosis and prognosis, better than the clinical gold standard. The task, however, is very challenging because of the extremely high resolution of the images, highly heterogeneous microenvironment, multiple sources of noise and artifacts, and low-granularity of ground truth. A starting point of related work which tackles the same task is the extraction of handcrafted features. I investigate the application of machine learning for prognosis using handcrafted features, and develop prognostic machine learning models that demonstrate better performances than baselines based on clinically employed prognostic systems, in two separate cohorts of colorectal and muscle-invasive bladder cancer patients. Moreover, analysis of the proposed methods provides insight behind the prognostic nature of characteristics within the microenvironment, not yet included in the clinical systems. The emergence of deep learning has enabled analysis with images directly. Given the laborious, expensive, and human bias inducing nature of designing and building pipelines for handcrafted feature extraction, I investigate the application of deep learning on tissue images directly. In particular, I propose a framework that allows the training of models directly from exhaustively-tiled whole slide images with only patient-level ground truth, and demonstrate its effectiveness on colorectal cancer prognosis. In my final work, I introduce a new type of CNN-based method, called Magnifying Networks, for gigapixel image analysis that does not require whole slide images to be patch-based preprocessed. Instead, MagNets dynamically extract patches from the tissue image based on the best magnification level, field-of-view, and location according to an optimizing task, and not based on generic, predefined or static ways. My results on the publicly available Camelyon16 and Camelyon17 datasets demonstrate the effectiveness of MagNets, as well as the proposed optimization framework, on the task of whole slide image classification. MagNets process far fewer patches from each slide than any of the existing end-to-end approaches (10 to 300 times fewer).

DOI

https://doi.org/10.17630/sta/336

Type

Thesis, PhD Doctor of Philosophy

Rights

Creative Commons Attribution-NonCommercial 4.0 International

http://creativecommons.org/licenses/by-nc/4.0/

Collections

Computer Science Theses

URI

https://hdl.handle.net/10023/27139

Creative Commons Attribution-NonCommercial 4.0 International

Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial 4.0 International

St Andrews Research Repository