Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorConway, Alexander
dc.contributor.authorDurbach, Ian Noel
dc.contributor.authorMcInnes, Alistair
dc.contributor.authorHarris, Rob
dc.date.accessioned2021-03-03T12:30:01Z
dc.date.available2021-03-03T12:30:01Z
dc.date.issued2021-03
dc.identifier272576839
dc.identifier6a803ac2-7c51-41e0-9b90-f1051adeb150
dc.identifier85103399895
dc.identifier000636318200004
dc.identifier.citationConway , A , Durbach , I N , McInnes , A & Harris , R 2021 , ' Frame-by-frame annotation of video recordings using deep neural networks ' , Ecosphere , vol. 12 , no. 3 , e03384 . https://doi.org/10.1002/ecs2.3384en
dc.identifier.issn2150-8925
dc.identifier.otherORCID: /0000-0003-0769-2153/work/90112771
dc.identifier.urihttps://hdl.handle.net/10023/21545
dc.descriptionFunding: Scottish Government (Grant Number(s): Marine Mammal Scientific Support Research Program); Homebrew Films; National Research Foundation of South Africa (Grant Number(s): 105782, 90782).en
dc.description.abstractVideo data are widely collected in ecological studies, but manual annotation is a challenging and time‐consuming task, and has become a bottleneck for scientific research. Classification models based on convolutional neural networks (CNNs) have proved successful in annotating images, but few applications have extended these to video classification. We demonstrate an approach that combines a standard CNN summarizing each video frame with a recurrent neural network (RNN) that models the temporal component of video. The approach is illustrated using two datasets: one collected by static video cameras detecting seal activity inside coastal salmon nets and another collected by animal‐borne cameras deployed on African penguins, used to classify behavior. The combined RNN‐CNN led to a relative improvement in test set classification accuracy over an image‐only model of 25% for penguins (80% to 85%), and substantially improved classification precision or recall for four of six behavior classes (12–17%). Image‐only and video models classified seal activity with very similar accuracy (88 and 89%), and no seal visits were missed entirely by either model. Temporal patterns related to movement provide valuable information about animal behavior, and classifiers benefit from including these explicitly. We recommend the inclusion of temporal information whenever manual inspection suggests that movement is predictive of class membership.
dc.format.extent11
dc.format.extent3523474
dc.language.isoeng
dc.relation.ispartofEcosphereen
dc.subjectAnimal-borne videoen
dc.subjectAutomated detectionen
dc.subjectDeep learningen
dc.subjectImage classificationen
dc.subjectNeural networksen
dc.subjectVideo classificationen
dc.subjectGC Oceanographyen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectQH301 Biologyen
dc.subjectDASen
dc.subject.lccGCen
dc.subject.lccQA75en
dc.subject.lccQH301en
dc.titleFrame-by-frame annotation of video recordings using deep neural networksen
dc.typeJournal articleen
dc.contributor.institutionUniversity of St Andrews. Sea Mammal Research Uniten
dc.contributor.institutionUniversity of St Andrews. Scottish Oceans Instituteen
dc.contributor.institutionUniversity of St Andrews. School of Biologyen
dc.contributor.institutionUniversity of St Andrews. School of Mathematics and Statisticsen
dc.contributor.institutionUniversity of St Andrews. Centre for Research into Ecological & Environmental Modellingen
dc.identifier.doi10.1002/ecs2.3384
dc.description.statusPeer revieweden
dc.identifier.urlhttps://www.biorxiv.org/content/10.1101/2020.06.29.177261v1.fullen


This item appears in the following Collection(s)

Show simple item record