Abstract: Content-based
retrieval has emerged in the face of content explosion as a promising approach
to information access. In this paper, we focus on the challenging issue of
recognizing the emotion content of music signals, or music emotion recognition
(MER). Specifically, we formulate MER as a regression problem to predict the
arousal and valence values (AV values) of each music sample directly.
Associated with the AV values, each music sample becomes a point in the
arousal-valence plane, so the users can efficiently retrieve the music sample
by specifying a desired point in the emotion plane. Because no categorical
taxonomy is used, the regression approach is free of the ambiguity inherent to
conventional categorical approaches. To improve the performance, we apply
principal component analysis to reduce the correlation between arousal and
valence, and RReliefF to select important features.
An extensive performance study is conducted to evaluate the accuracy of the
regression approach for predicting AV values. The best performance evaluated in
terms of the R2 statistics reaches 58.3% for arousal and 28.1% for
valence by employing support vector machine as the regressor.
We also apply the regression approach to detect the emotion variation within a
music selection and find the prediction accuracy superior to existing works. A
group-wise MER scheme is also developed to address the subjectivity issue of
emotion perception.
Keywords: Music emotion recognition (MER), arousal, Valence, regression, support vector machine.