Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 8

Proceedings of the 2016 Federated Conference on Computer Science and Information Systems

Analysis of time-frequency representations for musical onset detection with convolutional neural network.

,

DOI: http://dx.doi.org/10.15439/2016F558

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 147152 ()

Full text

Abstract. In this paper a convolutional neural network is applied to the problem of note onset detection in audio recordings. Two time-frequency representations are analysed, showing the superiority of standard spectrogram over enhanced autocorrelation (EAC) used as the input to the convolutional network. Experimental evaluation is based on a dataset containing 10,939 annotated onsets, with total duration of the audio recordings of over 45 min.

References

  1. A. Lerch, An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics. Wiley-IEEE Press, 2012.
  2. B. Thoshkahna and K. R. Ramakrishnan, “An onset detection algorithm for query by humming (QBH) applications using psychoacoustic knowledge.” in Proc. of 17th European Signal Processing Conference, EUSIPCO 2009. IEEE, 2009, pp. 939 – 942.
  3. B. Stasiak, “Query by Singing/Humming (MIREX 2015). The Tune Follower,” 2015. [Online]. Available: http://www.music-ir.org/mirex/abstracts/2015/BS2.pdf
  4. M. Purgina, A. Kuznetsov, and E. Pyshkin, “An approach for developing a mobile accessed music search integration platform,” in Proc. of Federated Conference on Computer Science and Information Systems, FedCSIS 2013, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds. IEEE, 2013, pp. 267–273.
  5. E. Półrolniczak and M. Kramarczyk, “Analysis of the sound attack in context of computer evaluation of the singing voice quality,” in Proc. of Federated Conference on Computer Science and Information Systems, FedCSIS 2015, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 5. IEEE, 2015. http://dx.doi.org/10.15439/2015F240 pp. 889–894.
  6. B. Stasiak and K. Rychlicki-Kicior, “Fundamental frequency extraction in speech emotion recognition,” Communications in Computer and Information Science, vol. 287, pp. 292 – 303, 2012. http://dx.doi.org/10.1007/978-3-642-30721-8-29
  7. H. Wang and L. Wang, “Onset detection algorithm in voice activity detection for Mandarin,” in Proc. of Int. Conf. on Computer Science and Network Technology (ICCSNT). IEEE, 2013. http://dx.doi.org/10.1109/ICC-SNT.2013.6967305 pp. 1148 – 1151.
  8. J. Bello, L. Daudet, S. Abdullah, C. Duxbury, M. Davies, and M. Sandler, “A Tutorial on Onset Detection in Music Signals,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 1035–1047, September 2005. http://dx.doi.org/10.1109/TSA.2005.851998
  9. B. H. Repp, “Patterns of note onset asynchronies in expressive piano performance,” Journal of the Acoustical Society of America (JASA), vol. 100, no. 6, pp. 3917–3932, 1996. http://dx.doi.org/10.1121/1.417245
  10. B. Stasiak, J. Mońko, and A. Niewiadomski, “Note onset detection in musical signals via neural-network-based multi-ODF fusion,” International Journal of Applied Mathematics and Computer Science, vol. 26, no. 1, pp. 203 – 213, 2016. http://dx.doi.org/10.1515/amcs-2016-0014
  11. P. Bello and M. Sandler, “Phase-based note onset detection for music signals,” in Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing ICASSP, vol. 5, 2003. http://dx.doi.org/10.1109/I-CASSP.2003.1200001 pp. 441–444.
  12. C. Duxbury, J. Bello, M. Davies, and M. Sandler, “Complex Domain Onset Detection For Musical Signals,” in Proceedings of the 6th International Conference on Digital Audio Effects (DAFx-03), September 2003.
  13. J. Laroche, “Efficient Tempo and Beat Tracking in Audio Recordings,” Journal of the Audio Engineering Society (JAES), vol. 51, no. 4, pp. 226–233, 2003.
  14. S. Böck, F. Krebs, and M. Schedl, “Evaluating the online capabilities of onset detection methods.” in Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), 2012., 2012.
  15. V. Korzhik, G. Morales-Luna, A. Kochkarev, and I. Shevchuk, “Fingerprinting system for still images based on the use of a holographic transform domain,” in Proc. of Federated Conference on Computer Science and Information Systems, FedCSIS 2013, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds. IEEE, 2013, pp. 585–590.
  16. B. Stasiak and M. Yatsymirskyy, Frequency Domain Methods for Content-Based Image Retrieval in Multimedia Databases. Springer Berlin Heidelberg, 2009, pp. 137 – 166. ISBN 978-3-642-02196-1. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-02196-1_6
  17. S. Böck and G. Widmer, “Maximum filter vibrato suppression for onset detection,” in Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland, September 2013, pp. 55–61.
  18. B. Stasiak and J. Mońko, “Analysis of Onset Detection with a Maximum Filter in Recordings of Bowed Instruments,” in Proceedings of the 138th Audio Engineering Society Convention, May 2015. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=17695
  19. M. Marolt, A. Kavcic, and M. Privosnik, “Neural networks for note onset detection in piano music,” in Proceedings of the International Computer Music Conference, 2002.
  20. A. Lacoste and D. Eck, “A Supervised Classification Algorithm for Note Onset Detection.” EURASIP Journal of Advanced Signal Processing, pp. 153–153, 2007. http://dx.doi.org/10.1155/2007/43745
  21. S. Böck, A. Arzt, F. Krebs, and M. Schedl, “Online Real-time Onset Detection with Recurrent Neural Networks,” in Proceedings of the 15th International Conference on Digital Audio Effects (DAFx 2012), September 2012.
  22. M. Davy and S. J. Godsill, “Detection of abrupt spectral changes using support vector machines. An application to audio signal segmentation.” in ICASSP. IEEE, 2002. http://dx.doi.org/10.1109/ICASSP.2002.1005992 pp. 1313–1316.
  23. F. Eyben, S. Böck, B. Schuller, and A. Graves, “Universal Onset Detection with Bidirectional Long Short-Term Memory,” in Neural Networks, 11 th International Society for Music Information Retrieval Conference (ISMIR 2010), 2010, pp. 589–594.
  24. M. Tian, G. Fazekas, D. A. A. Black, and M. Sandler, “Design and Evaluation of Onset Detectors Using Different Fusion Policies,” in 15th International Society of Music Information Retrieval (ISMIR) Conference, 2014, pp. 631–636.
  25. N. D. Quintela, A. P. Giménez, and S. T. Guijarro, “A Comparison of Score-level Fusion Rules for Onset Detection in Music Signals.” in Proceedings of 10th International Society for Music Information Retrieval Conference ISMIR09, October 2009, pp. 117–121.
  26. J. Schlüter and S. Böck, “Musical Onset Detection with Convolutional Neural Networks,” in 6th International Workshop on Machine Learning and Music (MML) , 2013.
  27. J. Schlüter and S. Bock, “Improved Musical Onset Detection with Convolutional Neural Networks,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014), Florence, Italy, 2014. http://dx.doi.org/10.1109/ICASSP.2014.6854953
  28. L. Daudet, G. Richard, and P. Leveau, “Methodology and Tools for the evaluation of automatic onset detection algorithms in music.” in ISMIR, 2004, pp. 72–75.
  29. A. Holzapfel, Y. Stylianou, A. C. Gedik, and B. Bozkurt, “Three dimensions of pitched instrument onset detection.” IEEE Trans. Audio, Speech & Language Processing, vol. 18, no. 6, pp. 1517–1527, 2010. http://dx.doi.org/10.1109/TASL.2009.2036298
  30. J. Glover, V. Lazzarini, and J. Timoney, “Real-time detection of musical onsets with linear prediction and sinusoidal modeling.” EURASIP J. Adv. Sig. Proc., vol. 2011, p. 68, 2011. http://dx.doi.org/10.1186/1687-6180-2011-68
  31. T. Tolonen and M. Karjalainen, “A computationally efficient multipitch analysis model,” IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 708–716, Nov 2000. http://dx.doi.org/10.1109/89.876309
  32. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint https://arxiv.org/abs/1408.5093, 2014.