Files in this item
Speech emotion recognition with early visual cross-modal enhancement using spiking neural networks
Item metadata
dc.contributor.author | Mansouri-Benssassi, Esma | |
dc.contributor.author | Ye, Juan | |
dc.date.accessioned | 2019-11-25T10:30:01Z | |
dc.date.available | 2019-11-25T10:30:01Z | |
dc.date.issued | 2019-09-30 | |
dc.identifier | 262322181 | |
dc.identifier | b2ef2d01-4026-403d-9efe-376ec43cd491 | |
dc.identifier | 85073258399 | |
dc.identifier | 000530893806020 | |
dc.identifier.citation | Mansouri-Benssassi , E & Ye , J 2019 , Speech emotion recognition with early visual cross-modal enhancement using spiking neural networks . in 2019 International Joint Conference on Neural Networks, IJCNN 2019 . , 8852473 , Proceedings of the International Joint Conference on Neural Networks , vol. 2019-July , Institute of Electrical and Electronics Engineers Inc. , pp. 1-8 , 2019 International Joint Conference on Neural Networks, IJCNN 2019 , Budapest , Hungary , 14/07/19 . https://doi.org/10.1109/IJCNN.2019.8852473 | en |
dc.identifier.citation | conference | en |
dc.identifier.isbn | 9781728119854 | |
dc.identifier.issn | 2161-4393 | |
dc.identifier.other | ORCID: /0000-0002-2838-6836/work/68280979 | |
dc.identifier.uri | https://hdl.handle.net/10023/18994 | |
dc.description.abstract | Speech emotion recognition (SER) is an important part of affective computing and signal processing research areas. A number of approaches, especially deep learning techniques, have achieved promising results on SER. However, there are still challenges in translating temporal and dynamic changes in emotions through speech. Spiking Neural Networks (SNN) have demonstrated as a promising approach in machine learning and pattern recognition tasks such as handwriting and facial expression recognition. In this paper, we investigate the use of SNNs for SER tasks and more importantly we propose a new cross-modal enhancement approach. This method is inspired by the auditory information processing in the brain where auditory information is preceded, enhanced and predicted by a visual processing in multisensory audio-visual processing. We have conducted experiments on two datasets to compare our approach with the state-of-the-art SER techniques in both uni-modal and multi-modal aspects. The results have demonstrated that SNNs can be an ideal candidate for modeling temporal relationships in speech features and our cross-modal approach can significantly improve the accuracy of SER. | |
dc.format.extent | 8 | |
dc.format.extent | 502359 | |
dc.language.iso | eng | |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | |
dc.relation.ispartof | 2019 International Joint Conference on Neural Networks, IJCNN 2019 | en |
dc.relation.ispartofseries | Proceedings of the International Joint Conference on Neural Networks | en |
dc.subject | Multisensory integration | en |
dc.subject | Speech Emotion Recognition | en |
dc.subject | Spiking Neural Networks | en |
dc.subject | Unsupervised learning | en |
dc.subject | QA75 Electronic computers. Computer science | en |
dc.subject | T Technology | en |
dc.subject | Artificial Intelligence | en |
dc.subject | Software | en |
dc.subject | 3rd-DAS | en |
dc.subject.lcc | QA75 | en |
dc.subject.lcc | T | en |
dc.title | Speech emotion recognition with early visual cross-modal enhancement using spiking neural networks | en |
dc.type | Conference item | en |
dc.contributor.institution | University of St Andrews. School of Computer Science | en |
dc.identifier.doi | 10.1109/IJCNN.2019.8852473 |
This item appears in the following Collection(s)
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.