Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorMussa, Hamse Yussuf
dc.contributor.authorDe Ferrari, Luna
dc.contributor.authorMitchell, John B. O.
dc.date.accessioned2015-12-09T15:10:04Z
dc.date.available2015-12-09T15:10:04Z
dc.date.issued2015-12-03
dc.identifier235779362
dc.identifier8d8cd412-bc76-4a75-9427-11be9e24bb06
dc.identifier84988836378
dc.identifier.citationMussa , H Y , De Ferrari , L & Mitchell , J B O 2015 , ' Enzyme mechanism prediction : a template matching problem on InterPro signature subspaces ' , BMC Research Notes , vol. 8 , 744 . https://doi.org/10.1186/s13104-015-1730-7en
dc.identifier.issn1756-0500
dc.identifier.otherORCID: /0000-0002-0379-6097/work/34033381
dc.identifier.urihttps://hdl.handle.net/10023/7900
dc.descriptionThe authors thank the BBSRC for funding this research through grant BB/I00596X/1 and are also grateful to the Scottish Universities Life Sciences Alliance (SULSA) for financial supporten
dc.description.abstractWe recently reported that one may be able to predict with high accuracy the chemical mechanism of an enzyme by employing a simple pattern recognition approach: a k Nearest Neighbour rule with k=1 (k1NN) and 321 InterPro sequence signatures as enzyme features. The nearest-neighbour rule is known to be highly sensitive to errors in the training data, in particular when the available training dataset is small. This was the case in our previous study, in which our dataset comprised 248 enzymes annotated against 71 enzymatic mechanism labels from MACiE. In the current study, we have carefully re-analysed our dataset and prediction results to “explain” why a high variance k1NN rule exhibited such remarkable classification performance. We find that enzymes with different chemical mechanism labels in this dataset reside in barely overlapping subspaces in the feature space defined by the 321 features selected. These features contain the appropriate information needed to accurately classify the enzymatic mechanisms rendering our classification problem a basic look-up exercise. This observation dovetails with the low misclassification rate we reported. Our results provide explanations for the “anomaly” – a basic nearest-neighbour algorithm exhibiting remarkable prediction performance for enzymatic mechanism despite the fact that the feature space was large and sparse. Our results also dovetail well with another finding we reported, namely that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also suggest simple rules that might enable one to inductively predict whether a novel enzyme possesses any of our 71 predefined mechanisms.
dc.format.extent1066601
dc.language.isoeng
dc.relation.ispartofBMC Research Notesen
dc.subjectEnzyme mechanismen
dc.subjectInterPro signaturesen
dc.subjectNearest-neighbouren
dc.subjectQD Chemistryen
dc.subjectQR Microbiologyen
dc.subjectDASen
dc.subject.lccQDen
dc.subject.lccQRen
dc.titleEnzyme mechanism prediction : a template matching problem on InterPro signature subspacesen
dc.typeJournal articleen
dc.contributor.sponsorBBSRCen
dc.contributor.institutionUniversity of St Andrews. School of Chemistryen
dc.contributor.institutionUniversity of St Andrews. Biomedical Sciences Research Complexen
dc.contributor.institutionUniversity of St Andrews. EaSTCHEMen
dc.identifier.doi10.1186/s13104-015-1730-7
dc.description.statusPeer revieweden
dc.identifier.urlhttp://www.biomedcentral.com/1756-0500/8/744en
dc.identifier.grantnumberBB/I00596X/1en


This item appears in the following Collection(s)

Show simple item record