Biologically-Inspired Speech Emotion Recognition Using Rate Map Representations: An Application to the ShEMO Persian Speech Database
DOI:
https://doi.org/10.5281/zenodo.10396163Anahtar Kelimeler:
Speech Emotion Recognition- Rate Map- Machine LearningÖz
This paper presents an innovative Speech Emotion Recognition (SER) model, inspired by the human auditory system, for analyzing and interpreting emotions in speech. Our proposed model utilizes a rate map representation to encode the spectro-temporal characteristics of auditory nerve activity, closely mimicking the intricate processes of human auditory perception. This model comprises several stages: pre-emphasis of the audio signal, cochlear filtering using a Gammatone Filter bank (GTF), neuromechanical transduction modeled by the Dau inner hair cell model, and the assembly of a rate map representation through integration of responses from each frequency channel. We apply this model to the ShEMO database, an extensive collection of Persian emotional speech, to detect and classify a spectrum of emotions. Our experimental results, obtained using deep learning architectures, demonstrate the effectiveness of the proposed model. We report the highest classification metrics with the Mobilenet architecture, achieving a performance of 71.57% and an F1 score of 51.52%. Overall, this study contributes to the field of speech emotion detection by offering a biologically-inspired model, validated with a substantial dataset, and yielding promising results in emotion classification using advanced machine learning techniques.
Referanslar
Ilyas, Ozer. "Pseudo-colored rate map representation for speech emotion recognition." Biomedical Signal Processing and Control 66 (2021): 102502.
Bhavan, Anjali, Pankaj Chauhan, and Rajiv Ratn Shah. "Bagged support vector machines for emotion recognition from speech." Knowledge-Based Systems 184 (2019): 104886.
Özseven, Turgut. "A novel feature selection method for speech emotion recognition." Applied Acoustics 146 (2019): 320-326.
Sun, Linhui, et al. "Speech emotion recognition based on DNN-decision tree SVM model." Speech Communication 115 (2019): 29-37.
Mustafa, Mumtaz Begum, et al. "Speech emotion recognition research: an analysis of research focus." International Journal of Speech Technology 21 (2018): 137-156.
Rázuri, Javier G., et al. "Speech emotion recognition in emotional feedbackfor human-robot interaction." International Journal of Advanced Research in Artificial Intelligence (IJARAI) 4.2 (2015): 20-27.
Sajjad, Muhammad, and Soonil Kwon. "Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM." IEEE access 8 (2020): 79861-79875.
Lu, Guanming, et al. "Speech emotion recognition based on long short-term memory and convolutional neural networks." Journal of Nanjing University of Posts and Telecommunications 38.5 (2018): 63-69.
Ozer, Ilyas, Zeynep Ozer, and Oguz Findik. "Noise robust sound event classification with convolutional neural network." Neurocomputing 272 (2018): 505-512.
FADEL, Mariem Mine CHEİKH MOHAMED, and Ö. Z. E. R. Zeynep. "Trafikle İlgili Seslerin İşitsel Modeller ve Konvolüsyonel Sinir Ağları Kullanılarak Sınıflandırılması." Mühendislik Bilimleri ve Araştırmaları Dergisi 5.2 (2023): 233-242.
Ozer, Ilyas, Zeynep Ozer, and Oguz Findik. "Lanczos kernel based spectrogram image features for sound classification." Procedia computer science 111 (2017): 137-144.
Rao, K. Sreenivasa, Shashidhar G. Koolagudi, and Ramu Reddy Vempada. "Emotion recognition from speech using global and local prosodic features." International journal of speech technology 16 (2013): 143-160.
Valstar, Michel, et al. "Avec 2016: Depression, mood, and emotion recognition workshop and challenge." Proceedings of the 6th international workshop on audio/visual emotion challenge. 2016.
Jiang, Pengxu, et al. "Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition." IEEE Access 7 (2019): 90368-90377.
Zhao, Jianfeng, Xia Mao, and Lijiang Chen. "Speech emotion recognition using deep 1D & 2D CNN LSTM networks." Biomedical signal processing and control 47 (2019): 312-323.
Martin, Olivier, et al. "The eNTERFACE'05 audio-visual emotion database." 22nd international conference on data engineering workshops (ICDEW'06). IEEE, 2006.
Mohamad Nezami, Omid, Paria Jamshid Lou, and Mansoureh Karami. "ShEMO: a large-scale validated database for Persian speech emotion detection." Language Resources and Evaluation 53 (2019): 1-16.
İndir
Yayınlanmış
Sayı
Bölüm
Lisans
Telif Hakkı (c) 2023 AINTELIA Science Notes Journal

Bu çalışma Creative Commons Attribution-NonCommercial 4.0 International License ile lisanslanmıştır.
TELİF HAKKI BİLDİRİMİ
Makale gönderen yazarlar, makaleleri yayınlanmak üzere kabul edildiğinde, makalenin telif hakkının Aintelia® Science Notes Journal (ASNJ)'a devredileceğini kabul ederler.
Yazarlar, çalışmalarını göndererek aşağıdaki şartları kabul ederler:
- Telif Hakkı Devri: Yayınlanan makalenin telif hakkı Aintelia® Science Notes Journal'a devredilir. Dergi, çalışmayı yayınlama, çoğaltma, dağıtma ve arşivleme hakkını saklı tutar.
- Lisanslama: Dergi telif hakkını elinde tutarken, makale Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0) lisansı altında lisanslanmıştır. Üçüncü şahısların, orijinal çalışma ve dergi uygun şekilde alıntılanmak kaydıyla, çalışmayı ticari olmayan amaçlarla paylaşmasına ve uyarlamasına izin vermektedir.
- Yazar Hakları: Yazarlar, dergi orijinal yayıncı olarak belirtildiği sürece, makalelerini tez veya bitirme tezine dahil etmek, konferanslarda sunmak veya eğitim amaçlı olarak öğrencilere dağıtmak gibi kendi akademik ihtiyaçları için kullanma hakkı saklıdır.