Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorMcDonagh, James
dc.contributor.authorNath, Neetika
dc.contributor.authorDe Ferrari, Luna
dc.contributor.authorvan Mourik, Tanja
dc.contributor.authorMitchell, John B. O.
dc.date.accessioned2014-03-12T15:31:01Z
dc.date.available2014-03-12T15:31:01Z
dc.date.issued2014-02-24
dc.identifier102866887
dc.identifier68067324-35dd-4bfc-a028-24c2dc193c00
dc.identifier84896995296
dc.identifier000333478800016
dc.identifier.citationMcDonagh , J , Nath , N , De Ferrari , L , van Mourik , T & Mitchell , J B O 2014 , ' Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules ' , Journal of Chemical Information and Modeling , vol. 54 , no. 3 , pp. 844-856 . https://doi.org/10.1021/ci4005805en
dc.identifier.issn1549-9596
dc.identifier.otherORCID: /0000-0002-0379-6097/work/34033392
dc.identifier.otherORCID: /0000-0001-7683-3293/work/57088488
dc.identifier.urihttps://hdl.handle.net/10023/4518
dc.description.abstractWe present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ~1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.
dc.format.extent13
dc.format.extent1582568
dc.language.isoeng
dc.relation.ispartofJournal of Chemical Information and Modelingen
dc.subjectCheminformaticsen
dc.subjectChemical theoryen
dc.subjectDruglike moleculesen
dc.subjectQuantitative structure–property relationship (QSPR) modelsen
dc.subjectmachine learning modelsen
dc.subjectChemistry Development Kit (CDK) descriptorsen
dc.subjectQD Chemistryen
dc.subject.lccQDen
dc.titleUniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike moleculesen
dc.typeJournal articleen
dc.contributor.sponsorBBSRCen
dc.contributor.institutionUniversity of St Andrews. School of Chemistryen
dc.contributor.institutionUniversity of St Andrews. EaSTCHEMen
dc.contributor.institutionUniversity of St Andrews. Biomedical Sciences Research Complexen
dc.identifier.doi10.1021/ci4005805
dc.description.statusPeer revieweden
dc.identifier.grantnumberBB/I00596X/1en


This item appears in the following Collection(s)

Show simple item record