Biologically-Inspired Speech Emotion Recognition Using Rate Map Representations: An Application to the ShEMO Persian Speech Database

Authors

DOI:

https://doi.org/10.5281/zenodo.10396163

Keywords:

Speech Emotion Recognition, Rate Map, Machine Learning

Abstract

This paper presents an innovative Speech Emotion Recognition (SER) model, inspired by the human auditory system, for analyzing and interpreting emotions in speech. Our proposed model utilizes a rate map representation to encode the spectro-temporal characteristics of auditory nerve activity, closely mimicking the intricate processes of human auditory perception. This model comprises several stages: pre-emphasis of the audio signal, cochlear filtering using a Gammatone Filter bank (GTF), neuromechanical transduction modeled by the Dau inner hair cell model, and the assembly of a rate map representation through integration of responses from each frequency channel. We apply this model to the ShEMO database, an extensive collection of Persian emotional speech, to detect and classify a spectrum of emotions. Our experimental results, obtained using deep learning architectures, demonstrate the effectiveness of the proposed model. We report the highest classification metrics with the Mobilenet architecture, achieving a performance of 71.57% and an F1 score of 51.52%. Overall, this study contributes to the field of speech emotion detection by offering a biologically-inspired model, validated with a substantial dataset, and yielding promising results in emotion classification using advanced machine learning techniques.

Author Biography

  • İlyas Özer, Bandirma Onyedi Eylul University, Balikesir, Turkey

    AINTELIA Artificial Intelligence Technologies Company, Bursa, Turkey, iozer@aintelia.com

References

Ilyas, Ozer. "Pseudo-colored rate map representation for speech emotion recognition." Biomedical Signal Processing and Control 66 (2021): 102502.

Bhavan, Anjali, Pankaj Chauhan, and Rajiv Ratn Shah. "Bagged support vector machines for emotion recognition from speech." Knowledge-Based Systems 184 (2019): 104886.

Özseven, Turgut. "A novel feature selection method for speech emotion recognition." Applied Acoustics 146 (2019): 320-326.

Sun, Linhui, et al. "Speech emotion recognition based on DNN-decision tree SVM model." Speech Communication 115 (2019): 29-37.

Mustafa, Mumtaz Begum, et al. "Speech emotion recognition research: an analysis of research focus." International Journal of Speech Technology 21 (2018): 137-156.

Rázuri, Javier G., et al. "Speech emotion recognition in emotional feedbackfor human-robot interaction." International Journal of Advanced Research in Artificial Intelligence (IJARAI) 4.2 (2015): 20-27.

Sajjad, Muhammad, and Soonil Kwon. "Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM." IEEE access 8 (2020): 79861-79875.

Lu, Guanming, et al. "Speech emotion recognition based on long short-term memory and convolutional neural networks." Journal of Nanjing University of Posts and Telecommunications 38.5 (2018): 63-69.

Ozer, Ilyas, Zeynep Ozer, and Oguz Findik. "Noise robust sound event classification with convolutional neural network." Neurocomputing 272 (2018): 505-512.

FADEL, Mariem Mine CHEİKH MOHAMED, and Ö. Z. E. R. Zeynep. "Trafikle İlgili Seslerin İşitsel Modeller ve Konvolüsyonel Sinir Ağları Kullanılarak Sınıflandırılması." Mühendislik Bilimleri ve Araştırmaları Dergisi 5.2 (2023): 233-242.

Ozer, Ilyas, Zeynep Ozer, and Oguz Findik. "Lanczos kernel based spectrogram image features for sound classification." Procedia computer science 111 (2017): 137-144.

Rao, K. Sreenivasa, Shashidhar G. Koolagudi, and Ramu Reddy Vempada. "Emotion recognition from speech using global and local prosodic features." International journal of speech technology 16 (2013): 143-160.

Valstar, Michel, et al. "Avec 2016: Depression, mood, and emotion recognition workshop and challenge." Proceedings of the 6th international workshop on audio/visual emotion challenge. 2016.

Jiang, Pengxu, et al. "Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition." IEEE Access 7 (2019): 90368-90377.

Zhao, Jianfeng, Xia Mao, and Lijiang Chen. "Speech emotion recognition using deep 1D & 2D CNN LSTM networks." Biomedical signal processing and control 47 (2019): 312-323.

Martin, Olivier, et al. "The eNTERFACE'05 audio-visual emotion database." 22nd international conference on data engineering workshops (ICDEW'06). IEEE, 2006.

Mohamad Nezami, Omid, Paria Jamshid Lou, and Mansoureh Karami. "ShEMO: a large-scale validated database for Persian speech emotion detection." Language Resources and Evaluation 53 (2019): 1-16.

Downloads

Published

01-06-2023

How to Cite

Biologically-Inspired Speech Emotion Recognition Using Rate Map Representations: An Application to the ShEMO Persian Speech Database. (2023). AINTELIA Science Notes Journal, 2(1), 24-31. https://doi.org/10.5281/zenodo.10396163