Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 21

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems

Czech parliament meeting recordings as ASR training data

DOI: http://dx.doi.org/10.15439/2020F119

Citation: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 21, pages 185188 ()

Full text

Abstract. I present a way to leverage the stenographed recordings of the Czech parliament meetings for purposes of training a speech-to-text system. The article presents a method for scraping the data, acquiring word-level alignment and selecting reliable parts of the imprecise transcript. Finally, I present an ASR system trained on these and other data.

References

  1. M. Korvas, O. Plátek, O. Dušek, L. Žilka, and F. Jurčı́ček, “Free english and czech telephone speech corpus,” 2014.
  2. O. Plátek, O. Dušek, and F. Jurčı́ček, “Vystadial 2016 – czech data,” 2016, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [Online]. Available: http://hdl.handle.net/11234/1-1740
  3. M. Mikulová, J. Mı́rovský, A. Nedoluzhko, P. Pajas, J. Štěpánek, and J. Hajič, “Pdtsc 2.0-spoken corpus with rich multi-layer structural annotation,” in International Conference on Text, Speech, and Dialogue. Springer, 2017, pp. 129–137.
  4. J. Hajič, P. Pajas, P. Ircing, J. Romportl, N. Peterek, M. Spousta, M. Mikulová, M. Grůber, and M. Legát, “Prague DaTabase of spoken czech 1.0,” 2017, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [Online]. Available: http://hdl.handle.net/11234/1-2375
  5. M. Grůber, “Czech senior COMPANION expressive speech corpus,” 2014, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [Online]. Available: http://hdl.handle.net/11858/00-097C-0000-0023-1D76-9
  6. L. Šmı́dl and A. Pražák, “OVM – otázky václava moravce,” 2013, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [Online]. Available: http://hdl.handle.net/11858/00-097C-0000-000D-EC98-3
  7. L. Šmı́dl, P. Stanislav, and V. Radová, “STAZKA – speech recordings from vehicles,” 2015, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [Online]. Available: http://hdl.handle.net/11234/1-1510
  8. O. Krůza and N. Peterek, “Making community and asr join forces in web environment,” in International Conference on Text, Speech and Dialogue. Springer, 2012, pp. 415–421.
  9. O. Krůza, “Spoken corpus of karel makoň,” 2012, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [Online]. Available: http://hdl.handle.net/11372/LRT-1455
  10. A. Pražák, J. V. Psutka, J. Hoidekr, J. Kanis, L. Müller, and J. Psutka, “Automatic online subtitling of the czech parliament meetings,” in International Conference on Text, Speech and Dialogue. Springer, 2006, pp. 501–508.
  11. A. Pražák and L. Šmı́dl, “Czech parliament meetings,” 2012, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [Online]. Available: http://hdl.handle.net/11858/00-097C-0000-0005-CF9C-4
  12. P. J. Moreno, C. Joerg, J.-M. V. Thong, and O. Glickman, “A recursive algorithm for the forced alignment of very long audio segments,” in Fifth International Conference on Spoken Language Processing, 1998.
  13. T. J. Hazen, “Automatic alignment and error correction of human generated transcripts for long speech recordings,” in Ninth International Conference on Spoken Language Processing, 2006.
  14. A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint https://arxiv.org/abs/1412.5567, 2014.
  15. W. Byrne, J. Hajič, P. Ircing, F. Jelinek, S. Khudanpur, J. McDonough, N. Peterek, and J. Psutka, “Large vocabulary speech recognition for read and broadcast czech,” in International Workshop on Text, Speech and Dialogue. Springer, 1999, pp. 235–240.
  16. L. Benešová, M. Křen, and M. Waclawičová, “Korpus spontánnı́ mluvené češtiny oral2013,” Časopis pro modernı́ filologii (Journal for Modern Philology), vol. 1, no. 97, pp. 42–50, 2015.