Abstract: Emotion recognition from speech has gained significant attention in the field of human-computer interaction, where it plays a crucial role in creating empathetic and responsive systems. Traditional speech recognition systems focus on transcribing words, while Speech Emotion Detection (SED) aims to identify underlying emotional states from speech signals. In this research, we propose a machine learning-based SED system utilizing both classical and deep learning approaches for emotion classification. The system processes audio samples from the RAVDESS dataset, extracting features like MFCCs, Chroma, and Spectral Contrast using the Librosa library. The classification task is performed using Support Vector Machine (SVM) and Convolutional Neural Network (CNN) models. Experimental results indicate that the CNN model outperforms the SVM model, achieving a classification accuracy of 91.45% compared to SVM’s 85.60%. The CNN's superior performance is attributed to its ability to learn high-level features from spectrogram representations. The system demonstrates its applicability in various domains such as virtual assistants, educational tools, and adaptive entertainment platforms. This study underscores the potential of deep learning techniques in improving emotion detection accuracy and suggests future directions, including multilingual datasets and real-time applications on edge devices.
Keywords: Speech Emotion Detection, Machine Learning, Convolutional Neural Network (CNN), Support Vector Machine (SVM), Feature Extraction, RAVDESS Dataset, MFCC, Emotional Classification.
|
DOI:
10.17148/IJARCCE.2025.14502