Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorMussa, Hamse Yussuf
dc.contributor.authorMarcus, David
dc.contributor.authorMitchell, John B. O.
dc.contributor.authorGlen, Robert
dc.date.accessioned2015-06-12T09:40:03Z
dc.date.available2015-06-12T09:40:03Z
dc.date.issued2015-06-12
dc.identifier.citationMussa , H Y , Marcus , D , Mitchell , J B O & Glen , R 2015 , ' Verifying the fully “Laplacianised” posterior Naïve Bayesian approach and more ' , Journal of Cheminformatics , vol. 7 , no. 27 . https://doi.org/10.1186/s13321-015-0075-5en
dc.identifier.issn1758-2946
dc.identifier.otherPURE: 194612577
dc.identifier.otherPURE UUID: 2c6e740e-8b73-4665-9b81-f789487f332c
dc.identifier.otherScopus: 84930944620
dc.identifier.otherORCID: /0000-0002-0379-6097/work/34033386
dc.identifier.otherWOS: 000355976200001
dc.identifier.urihttp://hdl.handle.net/10023/6813
dc.descriptionMussa and Glen would like to thank Unilever for financial support, whereas Mussa and Mitchell thank the BBSRC for funding this research through grant BB/I00596X/1. Mitchell thanks the Scottish Universities Life Sciences Alliance (SULSA) for financial support.en
dc.description.abstractBackground In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the “Laplacian Corrected Modified Naïve Bayes” (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG’s work and introduces a new version of the SNB classifier: “Tapered Naïve Bayes” (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB. Results LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the “optimal” number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the “optimal” number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the “optimal” number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers. Conclusions The classification results obtained in this study concur with the mathematical based guidelines given in MMG’s paper—that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach.
dc.language.isoeng
dc.relation.ispartofJournal of Cheminformaticsen
dc.rights© 2015 Mussa et al. Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en
dc.subjectClassificationen
dc.subjectNaïve Bayesen
dc.subjectTaperingen
dc.subjectFeaturesen
dc.subject3rd-DASen
dc.titleVerifying the fully “Laplacianised” posterior Naïve Bayesian approach and moreen
dc.typeJournal articleen
dc.description.versionPublisher PDFen
dc.contributor.institutionUniversity of St Andrews.School of Chemistryen
dc.contributor.institutionUniversity of St Andrews.Biomedical Sciences Research Complexen
dc.contributor.institutionUniversity of St Andrews.EaSTCHEMen
dc.identifier.doihttps://doi.org/10.1186/s13321-015-0075-5
dc.description.statusPeer revieweden
dc.identifier.urlhttp://www.jcheminf.com/content/7/1/27en
dc.identifier.urlhttp://www.ncbi.nlm.nih.gov/pubmed/26075027en


This item appears in the following Collection(s)

Show simple item record