Abstract
This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.