Frame-by-frame annotation of video recordings using deep neural networks

Conway, Alexander; Durbach, Ian Noel; McInnes, Alistair; Harris, Rob

Show simple item record

Files in this item

Name:: Conway_2021_Frame_by_frame_annotation_Ecosphere_e03384_CCBY.pdf
Size:: 3.360Mb
Format:: PDF

View/Open

Item metadata

dc.contributor.author	Conway, Alexander
dc.contributor.author	Durbach, Ian Noel
dc.contributor.author	McInnes, Alistair
dc.contributor.author	Harris, Rob
dc.date.accessioned	2021-03-03T12:30:01Z
dc.date.available	2021-03-03T12:30:01Z
dc.date.issued	2021-03
dc.identifier	272576839
dc.identifier	6a803ac2-7c51-41e0-9b90-f1051adeb150
dc.identifier	85103399895
dc.identifier	000636318200004
dc.identifier.citation	Conway , A , Durbach , I N , McInnes , A & Harris , R 2021 , ' Frame-by-frame annotation of video recordings using deep neural networks ' , Ecosphere , vol. 12 , no. 3 , e03384 . https://doi.org/10.1002/ecs2.3384	en
dc.identifier.issn	2150-8925
dc.identifier.other	ORCID: /0000-0003-0769-2153/work/90112771
dc.identifier.uri	https://hdl.handle.net/10023/21545
dc.description	Funding: Scottish Government (Grant Number(s): Marine Mammal Scientific Support Research Program); Homebrew Films; National Research Foundation of South Africa (Grant Number(s): 105782, 90782).	en
dc.description.abstract	Video data are widely collected in ecological studies, but manual annotation is a challenging and time‐consuming task, and has become a bottleneck for scientific research. Classification models based on convolutional neural networks (CNNs) have proved successful in annotating images, but few applications have extended these to video classification. We demonstrate an approach that combines a standard CNN summarizing each video frame with a recurrent neural network (RNN) that models the temporal component of video. The approach is illustrated using two datasets: one collected by static video cameras detecting seal activity inside coastal salmon nets and another collected by animal‐borne cameras deployed on African penguins, used to classify behavior. The combined RNN‐CNN led to a relative improvement in test set classification accuracy over an image‐only model of 25% for penguins (80% to 85%), and substantially improved classification precision or recall for four of six behavior classes (12–17%). Image‐only and video models classified seal activity with very similar accuracy (88 and 89%), and no seal visits were missed entirely by either model. Temporal patterns related to movement provide valuable information about animal behavior, and classifiers benefit from including these explicitly. We recommend the inclusion of temporal information whenever manual inspection suggests that movement is predictive of class membership.
dc.format.extent	11
dc.format.extent	3523474
dc.language.iso	eng
dc.relation.ispartof	Ecosphere	en
dc.subject	Animal-borne video	en
dc.subject	Automated detection	en
dc.subject	Deep learning	en
dc.subject	Image classification	en
dc.subject	Neural networks	en
dc.subject	Video classification	en
dc.subject	GC Oceanography	en
dc.subject	QA75 Electronic computers. Computer science	en
dc.subject	QH301 Biology	en
dc.subject	DAS	en
dc.subject.lcc	GC	en
dc.subject.lcc	QA75	en
dc.subject.lcc	QH301	en
dc.title	Frame-by-frame annotation of video recordings using deep neural networks	en
dc.type	Journal article	en
dc.contributor.institution	University of St Andrews. Sea Mammal Research Unit	en
dc.contributor.institution	University of St Andrews. Scottish Oceans Institute	en
dc.contributor.institution	University of St Andrews. School of Biology	en
dc.contributor.institution	University of St Andrews. School of Mathematics and Statistics	en
dc.contributor.institution	University of St Andrews. Centre for Research into Ecological & Environmental Modelling	en
dc.identifier.doi	10.1002/ecs2.3384
dc.description.status	Peer reviewed	en
dc.identifier.url	https://www.biorxiv.org/content/10.1101/2020.06.29.177261v1.full	en

This item appears in the following Collection(s)

University of St Andrews Research

Show simple item record