Extracting Acoustic Features of Japanese Speech to Classify Emotions

Takashi Yamazaki; Minoru Nakayama

Extracting Acoustic Features of Japanese Speech to Classify Emotions

Takashi Yamazaki, Minoru Nakayama

DOI: http://dx.doi.org/10.15439/2017F533

Citation: Communication Papers of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 13, pages 141–145 (2017)

Full text

Abstract. An emotional detection technique which extracts acoustic features from audio recordings of speech was developed. Though the formant frequency of individual voices may contribute to emotional variations in speech, the differences between vowels has an influence on feature extraction. To reduce the influence, a simple procedure was developed to extract relative features of vowels for every mora. The estimation performance of this emotional detection technique was improved by 11\% using relative formant frequencies instead of formant frequencies. The strengths of some emotional expressions were also reflected in some features. The effectiveness of using acoustic features to estimate the category of emotionally inflected speech was confirmed.

References

J.Uemura, K.Mera, Y.Kurosawa, T.Takezawa, “Analysis of Inconsistency among Emotions Estimated from Linguistics, Acoustic, and Facial Expression Features and A Proposal of the Inconsistency Detecting Method,” Proc. of 78th annual meetings of IPSJ, 6Y-04, 4, 321–322, 2016.
T. Matsui, M. Hagiwara, “A Dialogue System with Emotion Estimation and Knowledge Acquisition Functions,” Trans. of Japan Society of Kansei Engineering, 16(1), 35–42, 2017. http://dx.doi.org/10.5057/jjske.TJSKE-D-16-00058
M. Shigenaga, “Features of Emotionally Uttered Speech Revealed by Discriminant Analysis,” IEICE Trans., Vol.J83-A, No.6, 726–735, 2000.
M. Shigenaga, “Characteristic Features of Emotionally uttered Speech Revealed by Discriminant Analysis (III): Discrimination of both Mixed Sentences and Test Data,” IEICE Technical Report, SP, 97(396), 65-72, 1997-11-21, 1997.
M. Shigenaga, “Characteristic Features of Emotionally uttered Speech Revealed by Discriminant Analysis (VI),” Proc. of Acoustic Society of Japan, 3-3-12, 1999.
NII Speech Resources Consortium, “Online gaming voice chat corpus with emotional label (OGVC),” URL http://research.nii.ac.jp/src/OGVC.html
P. Ekman, W.V. Friesen, Unmasking the face, Prentice-Hall, Inc., NJ, USA, 1975.
Paul Boersma, David Weenink http://www.fon.hum.uva.nl/praat/ (accessed 27th Jan., 2017)
H. Jouo, NIHONGO ONSEI KAGAKU, Badai Music Entertainment, Tokyo, Japan, 1998.