Discovering topic structures of a temporally evolving document corpus

Beykikhoshk, Adham; Arandelovic, Ognjen; Phung, Dinh; Venkatesh, Svetha

Show simple item record

Files in this item

Name:: Beykikhoshk_2017_Discovering_topic_structure_KIS_CC.pdf
Size:: 3.637Mb
Format:: PDF

View/Open

Item metadata

dc.contributor.author	Beykikhoshk, Adham
dc.contributor.author	Arandelovic, Ognjen
dc.contributor.author	Phung, Dinh
dc.contributor.author	Venkatesh, Svetha
dc.date.accessioned	2017-08-11T11:30:25Z
dc.date.available	2017-08-11T11:30:25Z
dc.date.issued	2018-06
dc.identifier	250119430
dc.identifier	27b70ebc-36a9-4e74-86d2-27099560ed8f
dc.identifier	85027159488
dc.identifier	000429480500003
dc.identifier.citation	Beykikhoshk , A , Arandelovic , O , Phung , D & Venkatesh , S 2018 , ' Discovering topic structures of a temporally evolving document corpus ' , Knowledge and Information Systems , vol. 55 , no. 3 , pp. 599-632 . https://doi.org/10.1007/s10115-017-1095-4	en
dc.identifier.issn	0219-1377
dc.identifier.uri	https://hdl.handle.net/10023/11428
dc.description.abstract	In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting, and merging. The power of the proposed framework is demonstrated on two medical literature corpora concerned with the autism spectrum disorder (ASD) and the metabolic syndrome (MetS)—both increasingly important research subjects with significant social and healthcare consequences. In addition to the collected ASD and metabolic syndrome literature corpora which we made freely available, our contribution also includes an extensive empirical analysis of the proposed framework. We describe a detailed and careful examination of the effects that our algorithms’s free parameters have on its output and discuss the significance of the findings both in the context of the practical application of our algorithm as well as in the context of the existing body of work on temporal topic analysis. Our quantitative analysis is followed by several qualitative case studies highly relevant to the current research on ASD and MetS, on which our algorithm is shown to capture well the actual developments in these fields.
dc.format.extent	34
dc.format.extent	3814030
dc.language.iso	eng
dc.relation.ispartof	Knowledge and Information Systems	en
dc.subject	Data mining	en
dc.subject	Non-parametric	en
dc.subject	Bayesian	en
dc.subject	Autism	en
dc.subject	ASD	en
dc.subject	Metabolic syndrome	en
dc.subject	QA75 Electronic computers. Computer science	en
dc.subject	RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry	en
dc.subject	NDAS	en
dc.subject.lcc	QA75	en
dc.subject.lcc	RC0321	en
dc.title	Discovering topic structures of a temporally evolving document corpus	en
dc.type	Journal article	en
dc.contributor.institution	University of St Andrews. School of Computer Science	en
dc.identifier.doi	https://doi.org/10.1007/s10115-017-1095-4
dc.description.status	Peer reviewed	en
dc.date.embargoedUntil	2017-08-10

This item appears in the following Collection(s)

University of St Andrews Research

Show simple item record