Greedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problems

Kew, William; Mitchell, John B. O.

Show simple item record

Files in this item

Name:: kew_mitchell_accepted_version.pdf
Size:: 648.6Kb
Format:: PDF

View/Open

Item metadata

dc.contributor.author	Kew, William
dc.contributor.author	Mitchell, John B. O.
dc.date.accessioned	2016-03-25T00:01:22Z
dc.date.available	2016-03-25T00:01:22Z
dc.date.issued	2015-09
dc.identifier	193903192
dc.identifier	f146a3fe-652e-4ca9-9943-543440e3acc5
dc.identifier	84942191120
dc.identifier	000364651100007
dc.identifier.citation	Kew , W & Mitchell , J B O 2015 , ' Greedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problems ' , Molecular Informatics , vol. 34 , no. 9 , pp. 634-647 . https://doi.org/10.1002/minf.201400122	en
dc.identifier.issn	1868-1743
dc.identifier.other	ORCID: /0000-0002-0379-6097/work/34033385
dc.identifier.uri	https://hdl.handle.net/10023/8484
dc.description.abstract	The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. Thinvestigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the ‘wisdom of crowds’ principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data pre-processing methodology was found to be crucial to performance of each method too.
dc.format.extent	664247
dc.language.iso	eng
dc.relation.ispartof	Molecular Informatics	en
dc.subject	Machine Learning	en
dc.subject	Quantitative structure-property relationships	en
dc.subject	Greedy ensembles	en
dc.subject	Linear ensembles	en
dc.subject	QD Chemistry	en
dc.subject	DAS	en
dc.subject.lcc	QD	en
dc.title	Greedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problems	en
dc.type	Journal article	en
dc.contributor.institution	University of St Andrews. School of Chemistry	en
dc.contributor.institution	University of St Andrews. Biomedical Sciences Research Complex	en
dc.contributor.institution	University of St Andrews. EaSTCHEM	en
dc.identifier.doi	10.1002/minf.201400122
dc.description.status	Peer reviewed	en
dc.date.embargoedUntil	2016-03-25
dc.identifier.url	http://onlinelibrary.wiley.com/doi/10.1002/minf.201400122/suppinfo	en

This item appears in the following Collection(s)

University of St Andrews Research

Show simple item record