Logo PTI Logo FedCSIS

Proceedings of the 18th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 35

BIGOS - Benchmark Intended Grouping of Open Speech Corpora for Polish Automatic Speech Recognition

DOI: http://dx.doi.org/10.15439/2023F1609

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 585590 ()

Full text

Abstract. This paper presents a Benchmark Intended Grouping of Open Speech (BIGOS), a new corpus designed for Polish Automatic Speech Recognition (ASR) systems. This initial version of the benchmark leverages 1,900 audio recordings from 71 distinct speakers, sourced from 10 publicly available speech corpora. Three proprietary ASR systems and five open-source ASR systems were evaluated on a diverse set of recordings and the corresponding original transcriptions. Interestingly, it was found that the performance of the latest open-source models is on par with that of more established commercial services. Furthermore, a significant influence of the model size on system accuracy was observed, as well as a decrease in scenarios involving highly specialized or spontaneous speech. The challenges of using public datasets for ASR evaluation purposes and the limitations based on this inaugural benchmark are critically discussed, along with recommendations for future research. BIGOS corpus and associated tools that facilitate replication and customization of the benchmark are made publicly available.

References

  1. Alëna Aksënova et al. “How Might We Create Better Benchmarks for Speech Recognition?” In: Association for Computational Linguistics, 2021, pp. 22–34. http://dx.doi.org/10.18653/v1/2021.bppf-1.4.
  2. Piotr Szymański et al. “WER we are and WER we think we are”. In: Association for Computational Linguistics, 2020, pp. 3290–3295. http://dx.doi.org/10.18653/v1/2020.findings-emnlp.295.
  3. Johannes Wirth and Rene Peinl. “ASR in German: A Detailed Error Analysis”. In: (2022). http://dx.doi.org/10.48550/arXiv.2204.05617.
  4. Miguel Del Rio et al. “Earnings-21: A Practical Benchmark for ASR in the Wild”. In: (2021).
  5. Miguel Del Rio et al. “Earnings-22: A Practical Benchmark for Accents in the Wild”. In: (Mar. 2022). http://dx.doi.org/10.48550/arXiv.2203.15591.
  6. Sanchit Gandhi, Patrick von Platen, and Alexander M. Rush. “ESC: A Benchmark For Multi-Domain End-to-End Speech Recognition”. In: (Oct. 2022). http://dx.doi.org/10.48550/arXiv.2210.13352.
  7. Malgorzata Anna Ulasik et al. “CEASR: A corpus for evaluating automatic speech recognition”. In: 2020, pp. 6477–6485.
  8. Péter Mihajlik et al. “BEA-Base: A Benchmark for ASR of Spontaneous Hungarian”. In: 2022 Language Resources and Evaluation Conference, LREC 2022 (Feb. 2022), pp. 1970–1977. DOI : 10.48550/arXiv.2202.00601.
  9. Vassil Panayotov et al. LIBRISPEECH: AN ASR CORPUS BASED ON PUBLIC DOMAIN AUDIO BOOKS.
  10. Vineel Pratap et al. “MLS: A Large-Scale Multilingual Dataset for Speech Research”. In: Proc. Interspeech 2020. 2020, pp. 2757–2761. http://dx.doi.org/10.21437/Interspeech.2020-2826.
  11. François Hernandez et al. “TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation”. In: (2018). http://dx.doi.org/10.1007/978-3-319-99579-3_21.
  12. Heidi Christensen et al. “The CHiME corpus: a resource and a challenge for computational hearing in multi-source environments”. In: ISCA, 2010, pp. 1918–1921. DOI : 10.21437/Interspeech.2010-552.
  13. Rosana Ardila et al. “Common Voice: A Massively-Multilingual Speech Corpus”. In: (2020). DOI : 10.48550/arXiv.1912.06670.
  14. Christian Gaida et al. “Comparing Open-Source Speech Recognition Toolkits”. In: 2014.
  15. Meredith Moore et al. “Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make”. In: 2019, pp. 2528–2532. http://dx.doi.org/10.21437/Interspeech.2019-3096.
  16. Ingo Siegert et al. Recognition Performance of Selected Speech Recognition APIs – A Longitudinal Study. 2020. DOI : 10.1007/978-3-030-60276-5_50.
  17. Binbin Xu et al. “A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect”. In: (2021).
  18. Vered Silber Varod et al. “A cross-language study of speech recognition systems for English, German, and Hebrew”. In: Online Journal of Applied Knowledge Management (2021), pp. 1–15. DOI : 10.36965/OJAKM. 2021.9(1)1-15.
  19. Morgane Riviere, Jade Copet, and Gabriel Synnaeve. “ASR4REAL: An extended benchmark for speech models”. In: (2021).
  20. Martha Maria Papadopoulou, Anna Zaretskaya, and Ruslan Mitkov. “Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis”. In: INCOMA Ltd., 2021, pp. 199–207.
  21. Alëna Aksënova et al. “Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data”. In: (2022). http://dx.doi.org/10.48550/arXiv.2205.08014.
  22. Regis Pires Magalhães et al. “Evaluation of Automatic Speech Recognition Approaches”. In: Journal of Information and Data Management 13 (3 Sept. 2022). http://dx.doi.org/10.5753/jidm.2022.2514.
  23. Marcin Pacholczyk. Przegląd I porównanie rozwiazań rozpoznawania mowy pod kątem rozpoznawania zbioru komend głosowych. 2018.
  24. Danijel Koržinek. “Task 5: Automatic speech recognition PolEval 2019 competition”. In: (2019). URL: http://2019.poleval.pl/files/2019/11.pdf.
  25. Nahuel Unai et al. “Development and evaluation of a Polish ASR system using the TLK toolkit”. 2019.
  26. Danijel Koržinek, Krzysztof Marasek, and Łukasz Brocki. Polish Read Speech Corpus for Speech Tools and Services. 2016.
  27. Piotr Pęzik. “Spokes – a search and exploration service for conversational corpus data”. In: 2015.
  28. Piotr Pęzik. “Increasing the Accessibility of Time-Aligned Speech Corpora with Spokes Mix”. In: European Language Resources Association (ELRA), 2018.
  29. Krzysztof Marasek, Danijel Korzinek, and Łukasz Brocki. “System for Automatic Transcription of Sessions of the Polish Senate”. In: (2014).
  30. Piotr Pęzik et al. DiaBiz - an Annotated Corpus of Polish Call Center Dialogs, pp. 20–25.
  31. Piotr Pęzik and Michał Adamczyk. Automatic Speech Recognition for Polish in 2022. University of Łódź, 2022. URL: https://clarin-pl.eu/dspace/bitstream/handle/11321/894/ASR_PL_report_2022.pdf.
  32. Alec Radford et al. “Robust Speech Recognition via Large-Scale Weak Supervision”. In: (2022). http://dx.doi.org/10.48550/arXiv.2212.04356.
  33. Piotr Kozierski et al. “Acoustic Model Training, using Kaldi, for Automatic Whispery Speech Recognition”. In: 2018. DOI : 10.15439/2018F255.