Abstract: Sentiments are considered as the representation of human feelings and emotions. With the emergence of social media, people are now increasingly using the images, videos and audios in order to express their opinions on social media platforms. A growing source of consumer information is represented through Audio content that gained increasing interest from researchers, companies and consumers. Compared to traditional text content , audio-visual content provide a more real experience as they allow the viewer to better understand the reviewers emotions, beliefs and intentions through richer channels such as intonations. This article thus attempts to mine opinions and identify sentiments from the diverse modalities. A database of videos to be referred is used consisting a set of videos. The proposed system measures the speaker’s current emotional state and requires at least 10 seconds of authentic speech to render the initial emotional analysis. All subsequent analyses are obtained every 5 seconds. The aim of multimodal data fusion is to increase the accuracy and reliability of estimates in turn to increase the usefulness of such systems in real-world applications.
Keywords: Sentiment analysis, Audio features, Text features, Multi-model classification.