Logo PTI Logo FedCSIS

Position Papers of the 17th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 31

Searching For Loops And Sound Samples With Feature Learning

DOI: http://dx.doi.org/10.15439/2022F279

Citation: Position Papers of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 31, pages 1318 ()

Full text

Abstract. In this paper, we evaluate feature learning in the problem of retrieving subjectively interesting sounds from electronic music tracks. We describe an active learning system designed to find sounds categorized as samples or loops. These retrieval tasks originate from a broader R\&D project, which concerns the use of machine learning for streamlining the creation of videogame content synchronized with soundtracks. The method is expected to function in the context of limited data availability, and as such cannot rely on supervised learning of what constitutes an``interesting sound''. We apply an active learning procedure that allows us to find sound samples without predefined classes through user interaction, and evaluate the use of neural network feature extraction in the problem.


  1. E. J. Humphrey, J. P. Bello, Y. LeCun, “Moving beyond feature design: Deep architectures and automatic feature learning in music informatics,” in ISMIR 2012, pp. 403-408.
  2. M. Defferrard, K. Benzi, P. Vandergheynst, X. Bresson, “FMA: A dataset for music analysis,” arXiv preprint https://arxiv.org/abs/1612.01840. 2017, https://doi.org/10.48550/arXiv.1612.01840
  3. Y. A. Chen, Y. H. Yang, J. C. Wang, H. Chen, “The AMG1608 dataset for music emotion recognition,” in ICASSP 2015, pp. 693-697, https://doi.org/0.1109/ICASSP.2015.7178058
  4. J. W. Kim, J. Salamon, P. Li, J. P. Bello, “Crepe: A convolutional representation for pitch estimation,” in ICASSP 2018, pp. 161-165, https://doi.org/10.1109/ICASSP.2018.8461329
  5. J. Jakubik, “Retrieving Sound Samples of Subjective Interest With User Interaction,” in Proc. of the 2020 Federated Conference on Computer Science and Information Systems, 2020, pp. 387-390, https://doi.org/10.15439/2020F82
  6. B. McFee, D. Ellis, “Analyzing Song Structure with Spectral Clustering,” in ISMIR 2014, pp. 405-410, https://doi.org/10.5281/zenodo.1415778
  7. Kothinti, S., Imoto, K., Chakrabarty, D., Sell, G., Watanabe, S., Elhilali, M. (2019, May). “Joint acoustic and class inference for weakly supervised sound event detection,” in ICASSP 2019, pp. 36-40, https://doi.org/10.1109/ICASSP.2019.8682772
  8. H. Xie, T. V. Huang, “Zero-Shot Audio Classification via Semantic Embeddings,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021, pp. 1233-1242, https://doi.org/10.48550/arXiv.2011.12133
  9. S. Makino, "Audio source separation," Springer, 2018.
  10. J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M. B. Sandler, “A tutorial on onset detection in music signals,” in IEEE Transactions on speech and audio processing, vol. 13, no. 5, 2005, pp. 1035-1047, https://doi.org/10.1109/TSA.2005.851998
  11. R. Marxer, J. Janer, "Study of Regularizations and Constraints in NMF-Based Drums Monaural Separation", in Proc. of the 7th Int. Conference on Digital Audio Effects (DAFx’13). Maynooth, Ireland, 2013.
  12. L. Lu, M. Wang, H. J. Zhang, “Repeating pattern discovery and structure analysis from acoustic music data,” in Proc. of the 6th ACM SIGMM Int. Workshop on Multimedia Information Retrieval, 2016, pp. 275-282, https://doi.org/10.1145/1026711.1026756
  13. P. López-Serrano, C. Dittmar, J. Driedger, M. Müller, “Towards Modeling and Decomposing Loop-Based Electronic Music,” in ISMIR 2016, pp. 502-508.
  14. J. B. L. Smith, M. Goto, “Nonnegative tensor factorization for source separation of loops in audio,” in ICASSP 2018, Calgary, Canada, pp. 171–175, https://doi.org/10.1109/MSP.2018.2877582
  15. J. B. L. Smith, Y. Kawasaki, M. Goto, “Unmixer: An interface for extracting and remixing loops,” in ISMIR 2019, Delft, Nethedlands, pp. 824–831, https://doi.org/10.5281/zenodo.3527938
  16. C. Chen, S. Xin, “Combined Transfer and Active Learning for High Accuracy Music Genre Classification Method,” in 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), IEEE, 2021, https://doi.org/10.1109/ICBAIE52039.2021.9390062
  17. A. Sarasúa, C. Laurier, P. Herrera, “Support vector machine active learning for music mood tagging,” in 9th International Symposium on Computer Music Modeling and Retrieval (CMMR), London, 2012, https://doi.org/10.1007/s00530-006-0032-2
  18. W. Li, X. Feng, M. Xue, “Reducing manual labeling in singing voice detection: An active learning approach,” in 2016 IEEE International Conference on Multimedia and Expo (ICME) IEEE, 2016, https://doi.org/10.1109/ICME.2016.7552987
  19. Fu, Yifan, Xingquan Zhu, and Bin Li. “A survey on instance selection for active learning,” in Knowledge and information systems, vol. 35.2, pp. 249-283, 2013, https://doi.org/10.1007/s10115-012-0507-8
  20. T. H. Hsieh, L. Su, Y. H. Yang, “A streamlined encoder/decoder architecture for melody extraction,” in ICASSP 2019, pp. 156-160, https://doi.org/10.1109/ICASSP.2019.8682389
  21. J. Spijkervet, J. A.Y. Burgoyne, "Contrastive Learning of Musical Representations." arXiv preprint https://arxiv.org/abs/2103.09410, 2021, https://doi.org/10.48550/arXiv.2103.09410
  22. Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Valko, M. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33, 21271-21284, https://doi.org/10.48550/arXiv.2006.07733
  23. Nguyen, K., Nguyen, Y., & Le, B. (2021). Semi-Supervising Learning, Transfer Learning, and Knowledge Distillation with SimCLR. arXiv preprint https://arxiv.org/abs/2108.00587, https://doi.org/10.48550/arXiv.2108.00587
  24. B. McFee, C. Raffel, D. Liang, D. P. W. Ellis, M. McVicar, E. Battenberg, O. Nieto, “librosa: Audio and music signal analysis in python,” in Proc. of the 14th python in science conference, pp. 18-25, 2015.
  25. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural Information Processing Systems, vol. 32, 2019, pp. 8024-8035, https://doi.org/10.48550/arXiv.1912.01703
  26. C.R. Harris, K.J. Millman, S.J. van der Walt, “Array programming with NumPy,” Nature vol. 585, pp. 357–362, 2020. http://dx.doi.org/0.1038/s41586-020-2649-2, https://doi.org/10.1038/s41586-020-2649-2
  27. F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” in Hournal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011, https://doi.org/10.48550/arXiv.1201.0490