A Random Forest model for predicting allosteric and functional sites on proteins
Abstract
We created a computational method to identify allosteric sites using a machine learning method trained and tested on protein structures containing bound ligand molecules. The Random Forest machine learning approach was adopted to build our three-way predictive model. Based on descriptors collated for each ligand and binding site, the classification model allows us to assign protein cavities as allosteric, regular or orthosteric, and hence to identify allosteric sites. 43 structural descriptors per complex were derived and were used to characterize individual protein-ligand binding sites belonging to the three classes, allosteric, regular and orthosteric. We carried out a separate validation on a further unseen set of protein structures containing the ligand 2-(N-cyclohexylamino) ethane sulfonic acid (CHES).
Citation
Chen , A S-Y , Westwood , N J , Brear , P , Rogers , G W , Mavridis , L & Mitchell , J B O 2016 , ' A Random Forest model for predicting allosteric and functional sites on proteins ' , Molecular Informatics , vol. 35 , no. 3-4 , pp. 125-135 . https://doi.org/10.1002/minf.201500108
Publication
Molecular Informatics
Status
Peer reviewed
ISSN
1868-1743Type
Journal article
Rights
© 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. This work is made available online in accordance with the publisher’s policies. This is the author created, accepted version manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at: https://dx.doi.org/10.1002/minf.201500108. This article may be used for non-commercial purposes in accordance with the Wiley Self-Archiving Policy [http://olabout.wiley.com/WileyCDA/Section/id-820227.html].
Description
We thank the Scottish Universities Life Sciences Alliance (SULSA) for funding to JBOM and for PB’s PhD studentship under NJW’s supervision.Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.