Abstract: The studies have been performed on harmony features for speech emotion recognition. The first- and second-order differences of harmony features play an important role in speech emotionrecognition. Propose a new Fourier parameter model using theperceptual content of voice quality and the first- and second-order differences forspeaker-independent speech emotion recognition. Experimental results show thatthe proposed Fourier parameter (FP) features are effective in identifying variousemotional states in speech signals. They improve the recognition rates over themethods using Mel frequency cepstral coefficient (MFCC) features by 16.2, 6.8and 16.6 points on the German database (EMODB), Chinese language database(CASIA) and Chinese elderly emotion database (EESDB). In particular, whencombining FP with MFCC, the recognition rates can be further improved on theaforementioned databases by 17.5, 10 and 10.5 points respectively. Neural network classifier can be used to improve the classification of different emotions.

Keywords: MFCC, CASIA, EMODB, FP model, Speech emotion recognition.