Show simple item record

Files in this item


Item metadata

dc.contributor.authorFink, Daniel
dc.contributor.authorJohnston, Alison
dc.contributor.authorStrimas-Mackey, Matt
dc.contributor.authorAuer, Tom
dc.contributor.authorHochachka, Wesley M.
dc.contributor.authorLigocki, Shawn
dc.contributor.authorOldham Jaromczyk, Lauren
dc.contributor.authorRobinson, Orin
dc.contributor.authorWood, Chris
dc.contributor.authorKelling, Steve
dc.contributor.authorRodewald, Amanda D.
dc.identifier.citationFink , D , Johnston , A , Strimas-Mackey , M , Auer , T , Hochachka , W M , Ligocki , S , Oldham Jaromczyk , L , Robinson , O , Wood , C , Kelling , S & Rodewald , A D 2023 , ' A Double machine learning trend model for citizen science data ' , Methods in Ecology and Evolution , vol. 14 , no. 9 , pp. 2435-2448 .
dc.identifier.otherPURE: 291061643
dc.identifier.otherPURE UUID: 5f6613f1-a3d7-4dee-ae21-22cdb0becc71
dc.identifier.otherRIS: urn:9A184EB5B54C18AB9A0649AABBC81E58
dc.identifier.otherORCID: /0000-0001-8221-013X/work/139554355
dc.identifier.otherScopus: 85165394908
dc.descriptionFunding: This work was funded by The Leon Levy Foundation, The Wolf Creek Foundation and the National Science Foundation (ABI sustaining: DBI-1939187). This work used Bridges2 at Pittsburgh Supercomputing Center and Anvil at Rosen Center for Advanced Computing at Purdue University through allocation DEB200010 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603 and #2138296. Our research was also funded through the 2017–2018 Belmont Forum and BiodivERsA joint call for research proposals, under the BiodivScen ERA-Net COFUND program, with financial support from the Academy of Finland (AKA, Univ. Turku: 326327, Univ. Helsinki: 326338), the Swedish Research Council (Formas, SLU: 2018-02440, Lund Univ.: 2018-02441), the Research Council of Norway (Forskningsrådet, NINA: 295767) and the U.S. National Science Foundation (NSF, Cornell Univ.: ICER-1927646).en
dc.description.abstract1. Citizen and community science datasets are typically collected using flexible protocols. These protocols enable large volumes of data to be collected globally every year; however, the consequence is that these protocols typically lack the structure necessary to maintain consistent sampling across years. This can result in complex and pronounced interannual changes in the observation process, which can complicate the estimation of population trends because population changes over time are confounded with changes in the observation process. 2. Here we describe a novel modelling approach designed to estimate spatially explicit species population trends while controlling for the interannual confounding common in citizen science data. The approach is based on Double machine learning, a statistical framework that uses machine learning (ML) methods to estimate population change and the propensity scores used to adjust for confounding discovered in the data. ML makes it possible to use large sets of features to control for confounding and to model spatial heterogeneity in trends. Additionally, we present a simulation method to identify and adjust for residual confounding missed by the propensity scores. 3. To illustrate the approach, we estimated species trends using data from the citizen science project eBird. We used a simulation study to assess the ability of the method to estimate spatially varying trends when faced with realistic confounding and temporal correlation. Results demonstrated the ability to distinguish between spatially constant and spatially varying trends. There were low error rates on the estimated direction of population change (increasing/decreasing) at each location and high correlations on the estimated magnitude of population change. 4. The ability to estimate spatially explicit trends while accounting for confounding inherent in citizen science data has the potential to fill important information gaps, helping to estimate population trends for species and/or regions lacking rigorous monitoring data.
dc.relation.ispartofMethods in Ecology and Evolutionen
dc.rightsCopyright © 2023 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.en
dc.subjectCausal Forestsen
dc.subjectCausal inferenceen
dc.subjectCitizen scienceen
dc.subjectDouble machine learningen
dc.subjectMachine learningen
dc.subjectPropensity scoreen
dc.subjectEcology, Evolution, Behavior and Systematicsen
dc.subjectEcological Modellingen
dc.titleA Double machine learning trend model for citizen science dataen
dc.typeJournal articleen
dc.description.versionPublisher PDFen
dc.contributor.institutionUniversity of St Andrews. Statisticsen
dc.contributor.institutionUniversity of St Andrews. Centre for Research into Ecological & Environmental Modellingen
dc.description.statusPeer revieweden

This item appears in the following Collection(s)

Show simple item record