Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorMansouri-Benssassi, Esma
dc.contributor.authorYe, Juan
dc.date.accessioned2019-11-25T10:30:01Z
dc.date.available2019-11-25T10:30:01Z
dc.date.issued2019-09-30
dc.identifier.citationMansouri-Benssassi , E & Ye , J 2019 , Speech emotion recognition with early visual cross-modal enhancement using spiking neural networks . in 2019 International Joint Conference on Neural Networks, IJCNN 2019 . , 8852473 , Proceedings of the International Joint Conference on Neural Networks , vol. 2019-July , Institute of Electrical and Electronics Engineers Inc. , pp. 1-8 , 2019 International Joint Conference on Neural Networks, IJCNN 2019 , Budapest , Hungary , 14/07/19 . https://doi.org/10.1109/IJCNN.2019.8852473en
dc.identifier.citationconferenceen
dc.identifier.isbn9781728119854
dc.identifier.issn2161-4393
dc.identifier.otherPURE: 262322181
dc.identifier.otherPURE UUID: b2ef2d01-4026-403d-9efe-376ec43cd491
dc.identifier.otherScopus: 85073258399
dc.identifier.otherORCID: /0000-0002-2838-6836/work/68280979
dc.identifier.otherWOS: 000530893806020
dc.identifier.urihttp://hdl.handle.net/10023/18994
dc.description.abstractSpeech emotion recognition (SER) is an important part of affective computing and signal processing research areas. A number of approaches, especially deep learning techniques, have achieved promising results on SER. However, there are still challenges in translating temporal and dynamic changes in emotions through speech. Spiking Neural Networks (SNN) have demonstrated as a promising approach in machine learning and pattern recognition tasks such as handwriting and facial expression recognition. In this paper, we investigate the use of SNNs for SER tasks and more importantly we propose a new cross-modal enhancement approach. This method is inspired by the auditory information processing in the brain where auditory information is preceded, enhanced and predicted by a visual processing in multisensory audio-visual processing. We have conducted experiments on two datasets to compare our approach with the state-of-the-art SER techniques in both uni-modal and multi-modal aspects. The results have demonstrated that SNNs can be an ideal candidate for modeling temporal relationships in speech features and our cross-modal approach can significantly improve the accuracy of SER.
dc.format.extent8
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof2019 International Joint Conference on Neural Networks, IJCNN 2019en
dc.relation.ispartofseriesProceedings of the International Joint Conference on Neural Networksen
dc.rightsCopyright © 2019 IEEE. This work has been made available online in accordance with publisher policies or with permission. Permission for further reuse of this content should be sought from the publisher or the rights holder. This is the author created accepted manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at https://doi.org/10.1109/IJCNN.2019.8852473en
dc.subjectMultisensory integrationen
dc.subjectSpeech Emotion Recognitionen
dc.subjectSpiking Neural Networksen
dc.subjectUnsupervised learningen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectT Technologyen
dc.subjectArtificial Intelligenceen
dc.subjectSoftwareen
dc.subject3rd-DASen
dc.subject.lccQA75en
dc.subject.lccTen
dc.titleSpeech emotion recognition with early visual cross-modal enhancement using spiking neural networksen
dc.typeConference itemen
dc.description.versionPostprinten
dc.contributor.institutionUniversity of St Andrews.School of Computer Scienceen
dc.identifier.doihttps://doi.org/10.1109/IJCNN.2019.8852473


This item appears in the following Collection(s)

Show simple item record