Can human experts predict solubility better than computers?
MetadataShow full item record
In this study, we design and carry out a survey, asking human experts to predict the aqueous solubility of druglike organic compounds. We investigate whether these experts, drawn largely from the pharmaceutical industry and academia, can match or exceed the predictive power of algorithms. Alongside this, we implement 10 typical machine learning algorithms on the same dataset. The best algorithm, a variety of neural network known as a multi-layer perceptron, gave an RMSE of 0.985 log S units and an R2 of 0.706. We would not have predicted the relative success of this particular algorithm in advance. We found that the best individual human predictor generated an almost identical prediction quality with an RMSE of 0.942 log S units and an R2 of 0.723. The collection of algorithms contained a higher proportion of reasonably good predictors, nine out of ten compared with around half of the humans. We found that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median generated excellent predictivity. While our consensus human predictor achieved very slightly better headline figures on various statistical measures, the difference between it and the consensus machine learning predictor was both small and statistically insignificant. We conclude that human experts can predict the aqueous solubility of druglike molecules essentially equally well as machine learning algorithms. We find that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median is a powerful way of benefitting from the wisdom of crowds.
Boobier , S , Osbourn , A & Mitchell , J B O 2017 , ' Can human experts predict solubility better than computers? ' Journal of Cheminformatics , vol 9 , no. 63 . DOI: 10.1186/s13321-017-0250-y
Journal of Cheminformatics
© The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
This work took place as SB’s MChem undergraduate research project at the University of St Andrews. The authors thank University of St Andrews Library for funding the Open Access publication of this work. AO’s laboratory is supported by the UK Biotechnological and Biological Sciences Research Council (BBSRC) Institute Strategic Programme Grant ‘Molecules from Nature’ (BB/P012523/1) and the John Innes Foundation.
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.