Show simple item record

Files in this item


Item metadata

dc.contributor.authorNagel, Rebecca
dc.contributor.authorRuxton, Graeme Douglas
dc.contributor.authorMorrissey, Michael Blair
dc.identifier.citationNagel , R , Ruxton , G D & Morrissey , M B 2024 , ' Classical tests, linear models, and their extensions for the analysis of 2x2 contingency tables ' , Methods in Ecology and Evolution , vol. 15 , no. 5 , pp. 843-855 .
dc.identifier.otherORCID: /0000-0001-8943-6609/work/157140660
dc.descriptionFunding: Deutsche Forschungsgemeinschaft - 515410943; Royal Society London - University Research Fellowship.en
dc.description.abstract1. Ecologists and evolutionary biologists are regularly tasked with the comparison of binary data across groups. There is, however, some discussion in the biostatistics literature about the best methodology for the analysis of data comprising binary explanatory and response variables forming a 2 × 2 contingency table. 2. We assess several methodologies for the analysis of 2 × 2 contingency tables using a simulation scheme of different sample sizes with outcomes evenly or unevenly distributed between groups. Specifically, we assess the commonly recommended logistic (generalised linear model [GLM]) regression analysis, the classical Pearson chi-squared test and four conventional alternatives (Yates' correction, Fisher's exact, exact unconditional and mid-p), as well as the widely discouraged linear model (LM) regression. 3. We found that both LM and GLM analyses provided unbiased estimates of the difference in proportions between groups. LM and GLM analyses also provided accurate standard errors and confidence intervals when the experimental design was balanced. When the experimental design was unbalanced, sample size was small, and one of the two groups had a probability close to 1 or 0, LM analysis could substantially over- or under-represent statistical uncertainty. For null hypothesis significance testing, the performance of the chi-squared test and LM analysis were almost identical. Across all scenarios, both had high power to detect non-null effects and reject false positives. By contrast, the GLM analysis was underpowered when using z-based p-values, in particular when one of the two groups had a probability near 1 or 0. The GLM using the LRT had better power to detect non-null results. 4. Our simulation results suggest that, wherever a chi-squared test would be recommended, a linear regression is a suitable alternative for the analysis of 2 × 2 contingency table data. When researchers opt for more sophisticated procedures, we provide R functions to calculate the standard error of a difference between two probabilities from a Bernoulli GLM output using the delta method. We also explore approaches to compliment GLM analysis of 2 × 2 contingency tables with credible intervals on the probability scale. These additional operations should support researchers to make valid assessments of both statistical and practical significances.
dc.relation.ispartofMethods in Ecology and Evolutionen
dc.subject2 x 2 contingency tableen
dc.subjectChi-squared testen
dc.subjectLinear modelsen
dc.subjectLogistic GLMsen
dc.subjectUncertainty estimatesen
dc.subjectQH301 Biologyen
dc.titleClassical tests, linear models, and their extensions for the analysis of 2x2 contingency tablesen
dc.typeJournal articleen
dc.contributor.sponsorThe Royal Societyen
dc.contributor.institutionUniversity of St Andrews. School of Biologyen
dc.contributor.institutionUniversity of St Andrews. Centre for Biological Diversityen
dc.contributor.institutionUniversity of St Andrews. Institute of Behavioural and Neural Sciencesen
dc.contributor.institutionUniversity of St Andrews. St Andrews Bioinformatics Uniten
dc.description.statusPeer revieweden

This item appears in the following Collection(s)

Show simple item record