Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorKew, William
dc.contributor.authorMitchell, John B. O.
dc.date.accessioned2016-03-25T00:01:22Z
dc.date.available2016-03-25T00:01:22Z
dc.date.issued2015-09
dc.identifier193903192
dc.identifierf146a3fe-652e-4ca9-9943-543440e3acc5
dc.identifier84942191120
dc.identifier000364651100007
dc.identifier.citationKew , W & Mitchell , J B O 2015 , ' Greedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problems ' , Molecular Informatics , vol. 34 , no. 9 , pp. 634-647 . https://doi.org/10.1002/minf.201400122en
dc.identifier.issn1868-1743
dc.identifier.otherORCID: /0000-0002-0379-6097/work/34033385
dc.identifier.urihttps://hdl.handle.net/10023/8484
dc.description.abstractThe application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. Thinvestigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the ‘wisdom of crowds’ principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data pre-processing methodology was found to be crucial to performance of each method too.
dc.format.extent664247
dc.language.isoeng
dc.relation.ispartofMolecular Informaticsen
dc.subjectMachine Learningen
dc.subjectQuantitative structure-property relationshipsen
dc.subjectGreedy ensemblesen
dc.subjectLinear ensemblesen
dc.subjectQD Chemistryen
dc.subjectDASen
dc.subject.lccQDen
dc.titleGreedy and linear ensembles of machine learning methods outperform single approaches for QSPR regression problemsen
dc.typeJournal articleen
dc.contributor.institutionUniversity of St Andrews. School of Chemistryen
dc.contributor.institutionUniversity of St Andrews. Biomedical Sciences Research Complexen
dc.contributor.institutionUniversity of St Andrews. EaSTCHEMen
dc.identifier.doi10.1002/minf.201400122
dc.description.statusPeer revieweden
dc.date.embargoedUntil2016-03-25
dc.identifier.urlhttp://onlinelibrary.wiley.com/doi/10.1002/minf.201400122/suppinfoen


This item appears in the following Collection(s)

Show simple item record