Complex temporal topic evolution modelling using the Kullback-Leibler divergence and the Bhattacharyya distance
MetadataShow full item record
The rapidly expanding corpus of medical research literature presents major challenges in the understanding of previous work, the extraction of maximum information from collected data, and the identification of promising research directions. We present a case for the use of advanced machine learning techniques as an aide in this task and introduce a novel methodology that is shown to be capable of extracting meaningful information from large longitudinal corpora and of tracking complex temporal changes within it. Our framework is based on (i) the discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes. More specifically, this is the first work that discusses and distinguishes between two groups of particularly challenging topic evolution phenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely recognized emergence and disappearance and gradual evolution. The proposed framework is evaluated on a public medical literature corpus.
Andrei , V & Arandelovic , O 2016 , ' Complex temporal topic evolution modelling using the Kullback-Leibler divergence and the Bhattacharyya distance ' EURASIP Journal on Bioinformatics and Systems Biology . DOI: 10.1186/s13637-016-0050-0
EURASIP Journal on Bioinformatics and Systems Biology
© The Author(s) 2016. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.