Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks

Mansouri Benssassi, Esma; Ye, Juan

Show simple item record

Files in this item

Name:: Mansouri_Benssassi_2021_SC_Generalisation_CC.pdf
Size:: 1.999Mb
Format:: PDF

View/Open

Item metadata

dc.contributor.author	Mansouri Benssassi, Esma
dc.contributor.author	Ye, Juan
dc.date.accessioned	2021-02-11T10:30:11Z
dc.date.available	2021-02-11T10:30:11Z
dc.date.issued	2021-01-16
dc.identifier	271472115
dc.identifier	e9a21239-e0f8-4e9f-ae18-67f9b6b1063a
dc.identifier	85100168720
dc.identifier	000608096600002
dc.identifier.citation	Mansouri Benssassi , E & Ye , J 2021 , ' Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks ' , Soft Computing , vol. First Online . https://doi.org/10.1007/s00500-020-05501-7	en
dc.identifier.issn	1432-7643
dc.identifier.other	ORCID: /0000-0002-2838-6836/work/88731535
dc.identifier.uri	https://hdl.handle.net/10023/21409
dc.description.abstract	Emotion recognition through facial expression and non verbal speech represent an important area in affective computing. They have been extensive studied, from classical feature extraction techniques to more recent deep learning approaches. However most of these approaches face two major challenges: (1) robustness – in the face of degradation such as noise, can a model still make correct predictions?, and (2) cross-dataset generalisation – when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a Spiking Neural Network (SNN) in predicting emotional states based on facial expression and speech data, then investigate and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM respectively when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accu- racy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisa- tion of facial features and vocal characteristics across subjects.
dc.format.extent	14
dc.format.extent	2096514
dc.language.iso	eng
dc.relation.ispartof	Soft Computing	en
dc.subject	Spiking neural network	en
dc.subject	Facial emotion recognition	en
dc.subject	Speech emotion recognition	en
dc.subject	Unsupervised learning	en
dc.subject	QA75 Electronic computers. Computer science	en
dc.subject	T Technology	en
dc.subject	NDAS	en
dc.subject.lcc	QA75	en
dc.subject.lcc	T	en
dc.title	Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks	en
dc.type	Journal article	en
dc.contributor.institution	University of St Andrews. School of Computer Science	en
dc.identifier.doi	https://doi.org/10.1007/s00500-020-05501-7
dc.description.status	Peer reviewed	en
dc.date.embargoedUntil	2021-01-16

This item appears in the following Collection(s)

University of St Andrews Research

Show simple item record