Feature Extraction of Binaural Recordings for Acoustic Scene Classification

Sławomir Zieliński; Hyunkook Lee

Feature Extraction of Binaural Recordings for Acoustic Scene Classification

Sławomir Zieliński, Hyunkook Lee

DOI: http://dx.doi.org/10.15439/2018F182

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 585–588 (2018)

Full text

Abstract. Binaural technology becomes increasingly popular in the multimedia systems. This paper identifies a set of features of binaural recordings suitable for the automatic classification of the four basic spatial audio scenes representing the most typical patterns of audio content distribution around a listener. Moreover, it compares the five artificial-intelligence-based methods applied to the classification of binaural recordings. The results show that both the spatial and the spectro-temporal features are essential to accurate classification of binaurally rendered acoustic scenes. The spectro-temporal features appear to have a stronger influence on the classification results than the spatial metrics. According to the obtained results, the method based on the support vector machine, exploiting the features identified in the study, yields the classification accuracy approaching 83.89\%.

References

J. Blauert, The Technology of Binaural Listening. Springer, New York, 2013, ch. 1. https://doi.org/10.1007/978-3-642-37762-4
D. Barchiesi, D. Giannoulis, D. Stowell, and M.D. Plumbley, “Acoustic scene classification: Classifying environments from the sounds they produce,” IEEE Signal Process. Mag., vol. 32, pp. 1634, 2015. https://doi.org/10.1109/msp.2014.2326181
A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, M.D. Plumbley, “Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 26, no. 2, pp. 379393, 2018. https://doi.org/10.1109/taslp.2017.2778423
D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events,” IEEE Trans. Multimedia, vol. 17, no. 10, pp. 1733–1746, Oct. 2015. https://doi.org/10.1109/tmm.2015.2428998
S. Chu, S. Narayanan, C.C.J. Kuo, and M. J. Matarić, “Where am I? Scene recognition for mobile robots using audio features,” in Proc. of IEEE International Conference on Multimedia and Expo, IEEE, Toronto, Canada, July, 2006. https://doi.org/10.1109/icme.2006.262661
I. Trowitzsch, J. Mohr, Y. Kashef, and K. Obermayer, “Robust Detection of Environmental Sounds in Binaural Auditory Scenes,” IEEE Trans. Audio, Speech, Language Process., vol. 25, no. 6, pp. 13441356, 2017. https://doi.org/10.1109/taslp.2017.2690573
Y. Han and J. Park, “Convolutional Neural Networks with Binaural Representations and Background Subtraction for Acoustic Scene Classification,” Workshop on Detection and Classification of Acoustic Scenes and Events, Munich, Germany, November, 2017.
Z. Weiping, Y. Jiantao, X. Xiaotao, L.Xiangtao and P. Shaohu, “Acoustic Scene Classification Using Deep Convolutional Neural Network and Multiple Spectrograms Fusion,” Workshop on Detection and Classification of Acoustic Scenes and Events, Munich, Germany, November, 2017.
S.K. Zieliński, “Feature extraction of surround sound recordings for acoustic scene classification,” In: Rutkowski L., Scherer R., Korytkowski M., Pedrycz W., Tadeusiewicz R., Zurada J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science, vol. 10842. Springer. https://doi.org/10.1007/978-3-319-91262-2_43
F. Rumsey, “Spatial quality evaluation for reproduced sound: Terminology, meaning, and a scene-based paradigm,” J. Audio Eng. Soc., vol. 50, pp. 651666, 2002.
L. Rabiner, B.-H. Juang, B. Yegnanarayana, Fundamentals of Speech Recognition, Pearson Education, 2008.
J. Blauert, Spatial Hearing. The Psychophysics of Human Sound Localization. The MIT Press, London, 1996, ch. 3.
G.J. Brown and M. Cooke, “Computational auditory scene analysis,” Computer Speech and Language, vol. 8, pp. 297–336, 1994.
A. Raake et al., “Two!ears—Integral interactive model of auditory perception and experience,” Proc. DAGA, 2014.
G. Peeters, B.Giordano, P. Susini, N. Misdariis, and S. McAdams, The Timbre Toolbox: Extracting audio descriptors from musical signals. J. Acoust. Soc. Am., vol. 130, no. 5, pp. 2902–2916, 2011. https://doi.org/10.1121/1.3642604
T. May, and T. Dau, “Computational speech segregation based on an auditory-inspired modulation analysis,” J. Acoust. Soc. Am., vol. 136, no. 6, pp. 3350–3359, 2014. https://doi.org/10.1121/1.4901711
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning with Applications in R, Springer, London, 2017, ch. 6.
A.D. Brown, G.C. Stecker, and D.J. Tollin, “The Precedence Effect in Sound Localization,” J. Assoc. Res. Otolaryngol., vol. 16, no. 1, pp. 1–28, 2015. https://doi.org/10.1007/s10162-014-0496-2