Sparse Coding Methods for Music Induced Emotion Recognition

Jan Jakubik, Halina Kwaśnicka

DOI: http://dx.doi.org/10.15439/2016F309

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 53–60 (2016)

Full text

Abstract. The paper concerns automatic recognition of emotion induced by music (MER, Music Emotion Recognition). Comparison of different sparse coding schemes in a task of MER is the main contribution of the paper. We consider a domain-specific categorization of emotions, called Geneva Emotional Music Scale, which focuses on induced emotions rather than expressed emotions. We were able to find only one dataset, namely Emotify, in which data are annotated with GEMS categories, this set was used in our experiments. Our main goal was to compare different sparse coding approaches in a task of learning features useful for predicting musically induced emotions, taking into account categories present in the GEMS. We compared five sparse coding methods and concluded that sparse autoencoders outperform other approaches.

References

Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. A. Speck, and D. Turnbul, "Music emotion recognition: a state of the art review," Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), 2010.
K. R. Scherer, M. Zentner, "Emotion effects of music: Production rules", in: Music and emotion: Theory and research, pp. 361-392, Oxford University Press, 2001.
Qin, Zengchang et all, "A Bag-of-Tones Model with MFCC Features for Musical Genre Classification," in: Advanced Data Mining and Applications: 9th International Conference Proceedings, Part I. ADMA 2013, p. 564–575, Springer Berlin Heidelberg, 2013.
E. J. Humphrey, J. P. Bello, and Y. LeCun, "Moving beyond feature design: Deep architectures and automatic feature learning in music informatics," in Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), 2012.
M. Henaff, K. Jarrett, K. Kavukcuoglu, and Y. LeCun, "Unsupervised learning of sparse features for scalable audio classification," in Pro- ceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011.
S. Sigtia and S. Dixon, "Improved music feature learning with deep neural networks," in Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014.
Y. Vaizman, B. McFee, and G. Lanckriet, "Codebook based audio feature representation for music information retrieval," IEEE Transactions on Acoustics, Speech and Signal Processing, 2014.
J. Nam, J. Herrera, M. Slaney, and J. Smith, "Learning sparse feature representations for music annotation and retrieval," in Proc. ISMIR, 2012.
T. Li and M. Ogihara, "Detecting emotion in music," in Proc.of the Intl. Conf. on Music Information Retrieval, Baltimore, MD, October 2003.
J. Skowronek, M. McKinney, and S. van de Par, "A demonstrator for automatic music mood estimation," in Proc. Intl. Conf. on Music Information Retrieval, Vienna, Austria, 2007.
C. Laurier, O. Lartillot, T. Eerola, and P. Toiviainen: "Exploring Relationships between Audio Features and Emotion in Music," Conference of European Society for the Cognitive Sciences of Music, 2009.
Y. H. Yang, Y. C. Lin, Y. F. Su, and H. H. Chen, "A Regression Approach to Music Emotion Recognition," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 16, No. 2, pp. 448-457, 2008.
P. Ekman, "An argument for basic emotions," Cognition Emotion vol. 6, no. 3, pp. 169-200, 2001.
K. Hevner, "Experimental studies of the elements of expression in music," American Journal of Psychology, vol. 48, pp. 248–268, 1936.
Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. A. Speck, and D. Turnbul, "Music emotion recognition: a state of the art review," 11th International Society for Music Information Retrieval Conference (ISMIR 2010), 2010.
U. Schimmack and R. Reisenzein, "Experiencing activation: energetic arousal and tense arousal are not mixtures of valence and activation," Emotion, vol. 2, no. 4, p. 412, 2002.
M. Zentner, D. Grandjean, and K. R. Scherer: "Emotions evoked by the sound of music: characterization, classification, and measurement," Emotion, vol. 8, no. 4, pp. 494-521, 2008.
A. Aljanaki, F. Wiering, and R. Veltkamp, "Computational modeling of induced emotion using GEMS," Proceedings of the 15th Conference of the International Society for Music Information Retrieval (ISMIR 2014), pp. 373-378, 2014.
G. E. Hinton, "A practical guide to training restricted Boltzmann machines," Tech. Rep. UTML TR 2010-003, Dept. Comput. Sci., Univ. Toronto, 2010.
N.J. Nalini, S. Palanivel, "Emotion Recognition in Music Signal using AANN and SVM," International Journal of Computer Applications vol. 77, no.2, 2013.
N. Glazyrin , "Mid-level features for audio chord recognition using a deep neural network," Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, vol. 155, no. 4, pp. 109–117, 2013.
A. Aljanaki, D. Bountouridis, J.A. Burgoyne, J. van Balen, F. Wiering, H. Honing, and R. C. Veltkamp, "Designing Games with a Purpose for Data Collection in Music Research. Emotify and Hooked: Two Case Studies," Proceedings of Games and Learning Alliance Conference, 2013.
B. Logan, "Mel frequency cepstral coefficients for music modeling," in Proc. of the Intl. Symposium on Music Information Retrieval, Plymouth, MA, 2000.
J. B. MacQueen, "Some Methods for classification and Analysis of Multivariate Observations," Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1967.
F. Bach, R. Jenatton, J. Mairal, G. Obozinski, "Optimization with Sparsity-Inducing Penalties," Foundations and Trends in Machine Learning vol. 4, no. 1, 2012.
S. Gao, I. W. H. Tsang, L. T. Chia, "Sparse representation with kernels," in IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 423–434, 2012.
Y. Bengio, "Learning Deep Architectures for AI", Foundations and Trends in Machine Learning, vol. 2, no. 1, 2009.
D. E. Rumelhart, G. E. Hinton, Williams, Ronald J. . "Learning representations by back-propagating errors", Nature, vol. 323, pp. 533–536, 1986.
Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Y. Ng, "On optimization methods for deep learning," in Proc. 28th Int. Conf. Machine Learning, pp. 265–272, 2011.
G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computing, vol. 14, pp. 1771-1800, 2002.
Olivier Lartillot, Petri Toiviainen, "A Matlab Toolbox for Musical Feature Extraction From Audio," International Conference on Digital Audio Effects, Bordeaux, 2007.
Y. Li "Sparse representation for high-dimensional data analysis," in Sparse Machine Learning Models in Bioinformatics, PhD Thesis, School of Computer Science, University of Windsor, Canada, 2013.
M. A. Keyvanrad and M. M. Homayounpour, "A brief survey on deep belief networks and introducing a new object oriented toolbox (DeeBNet)," https://arxiv.org/abs/1408.3264 [cs], Aug. 2014.
A. J. Smola, B. Scholkopf, "A tutorial on support vector regression," Statistics and Computing, vol. 14, no. 3 , pp 199-222, 2004.