Abstract: Emotions play a vital role in human communication, influencing decision-making, social interaction, and overall well-being. The ability to automatically recognize emotions across multiple modalities has become increasingly important in fields such as mental health monitoring, intelligent customer support, and affective computing. This work proposes a Multimodal Emotion Recognition System that integrates speech, text, and facial expression analysis to achieve a more reliable and comprehensive understanding of human emotions. For speech, features such as Mel Frequency Cepstral Coefficients (MFCCs) and pitch are extracted and analyzed using Random Forest classifiers along with Deep Neural Networks (DNNs) for improved performance. Text-based emotion recognition leverages the contextual learning capabilities of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) models to capture linguistic nuances. Facial expression recognition is conducted using Convolutional Neural Networks (CNNs), enhanced with wavelet transforms for better feature representation. The fusion of these modalities helps address the limitations of single-source emotion detection, leading to more accurate and holistic recognition. The proposed system is deployed as a user-friendly web application built with Flask, HTML, and CSS, making it accessible for practical use. This research contributes to the advancement of multimodal affective computing and highlights the potential of integrated ML and DL approaches for real-world emotion-aware applications.
Downloads:
|
DOI: 
10.17148/IJARCCE.2025.14928
[1] Srushti S Rao, Dr. K Balaji, "“Enhancing multi model emotion detection using deep learning and machine learning”," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2025.14928