Logo PTI Logo FedCSIS

Proceedings of the 18th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 35

Center for Artificial Intelligence Challenge on Conversational AI Correctness

, , ,

DOI: http://dx.doi.org/10.15439/2023F6058

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 13191324 ()

Full text

Abstract. This paper describes a challenge on Conversational AI correctness with the goal to develop Natural Language Understanding models that are robust against speech recognition errors. The data for the competition consist of natural language utterances along with semantic frames that represent the commands targeted at a virtual assistant. The specification of the task is given along the data preparation procedure and the evaluation rules. The baseline models for the task are discussed and the results of the competition are reported.

References

  1. D. Hakkani-Tür, F. Béchet, G. Riccardi, and G. Tur, “Beyond ASR 1-best: Using word confusion networks in spoken language understanding,” Computer Speech & Language, vol. 20, no. 4, pp. 495–514, 2006. http://dx.doi.org/https://doi.org/10.1016/j.csl.2005.07.005. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0885230805000495
  2. F. Ladhak, A. Gandhe, M. Dreyer, L. Mathias, A. Rastrow, and B. Hoffmeister, “LATTICE RNN: Recurrent neural networks over lattices,” in Interspeech 2016, 2016. [Online]. Available: https://www.amazon.science/publications/lattice-rnn-recurrent-neural-networks-over-lattices
  3. G. Tur, A. Deoras, and D. Hakkani-Tür, “Semantic parsing using word confusion networks with conditional random fields,” in Annual Conference of the International Speech Communication Association (Interspeech), Sep. 2013.
  4. X. Yang and J. Liu, “Using word confusion networks for slot filling in spoken language understanding,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
  5. W. Ruan, Y. Nechaev, L. Chen, C. Su, and I. Kiss, “Towards an ASR er- ror robust spoken language understanding system,” in Interspeech 2020, 2020. [Online]. Available: https://www.amazon.science/publications/ towards-an-asr-error-robust-spoken-language-understanding-system
  6. S. Sengupta, J. Krone, and S. Mansour, “On the robustness of intent classification and slot labeling in goal-oriented dialog systems to real-world noise,” in Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. Online: Association for Computational Linguistics, Nov. 2021. http://dx.doi.org/10.18653/v1/2021.nlp4convai-1.7 pp. 68–79. [Online]. Available: https://aclanthology.org/2021.nlp4convai-1.7
  7. J. Liu, R. Takanobu, J. Wen, D. Wan, H. Li, W. Nie, C. Li, W. Peng, and M. Huang, “Robustness testing of language understanding in task-oriented dialog,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, Aug. 2021. http://dx.doi.org/10.18653/v1/2021.acl-long.192 pp. 2467–2480. [Online]. Available: https://aclanthology.org/2021.acl-long.192
  8. Y. Nechaev, W. Ruan, and I. Kiss, “Towards NLU model robustness to ASR errors at scale,” in KDD 2021 Workshop on Data-Efficient Machine Learning, 2021. [Online]. Available: https://www.amazon.science/publications/towards-nlu-model-robustness-to-asr-errors-at-scale
  9. B. Peng, C. Li, Z. Zhang, C. Zhu, J. Li, and J. Gao, “RADDLE: An evaluation benchmark and analysis platform for robust task-oriented dialog systems,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, Aug. 2021. http://dx.doi.org/10.18653/v1/2021.acl-long.341 pp. 4418–4429. [Online]. Available: https://aclanthology.org/2021.acl-long.341
  10. L. Feng, J. Yu, D. Cai, S. Liu, H. Zheng, and Y. Wang, “ASR-GLUE: A new multi-task benchmark for ASR-robust natural language understanding,” CoRR, vol. abs/2108.13048, 2021. http://dx.doi.org/10.48550/arXiv.2108.13048. [Online]. Available: https://arxiv.org/abs/2108.13048
  11. M. Kubis, Z. Vetulani, M. Wypych, and T. Ziętkiewicz, “Open challenge for correcting errors of speech recognition systems,” in Human Language Technology. Challenges for Computer Science and Linguistics, Z. Vetulani, P. Paroubek, and M. Kubis, Eds. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05328-3_21. ISBN 978-3-031-05328-3 pp. 322–337.
  12. D. Koržinek, “Results of the PolEval 2020 Shared Task 1: Post-editing and Rescoring of Automatic Speech Recognition Results,” in Proceedings of the PolEval 2020 Workshop, 2020, pp. 9–14.
  13. H. Soltau, I. Shafran, M. Wang, A. Rastogi, J. Zhao, Y. Jia, W. Han, Y. Cao, and A. Miranda, “Speech Aware Dialog System Technology Challenge (DSTC11),” 2022. [Online]. Available: https://arxiv.org/abs/2212.08704
  14. T. Hayashi, S. Watanabe, Y. Zhang, T. Toda, T. Hori, R. Astudillo, and K. Takeda, “Back-translation-style data augmentation for end-to-end ASR,” in 2018 IEEE Spoken Language Technology Workshop (SLT), 2018. http://dx.doi.org/10.1109/SLT.2018.8639619 pp. 426–433.
  15. A. Laptev, R. Korostik, A. Svischev, A. Andrusenko, I. Medennikov, and S. Rybin, “You do not need more data: Improving end-to-end speech recognition by text-to-speech data augmentation,” in 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2020. http://dx.doi.org/10.1109/CISP-BMEI51763.2020.9263564 pp. 439–444.
  16. R. Grishman and B. Sundheim, “Message Understanding Conference-6: A brief history,” in COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, 1996. [Online]. Available: https://aclanthology.org/C96-1079
  17. A. Kilgarriff and M. Palmer, “Introduction to the special issue on SENSEVAL,” Comput. Humanit., vol. 34, no. 1-2, pp. 1–13, 2000. http://dx.doi.org/10.1023/A:1002619001915. [Online]. Available: https://doi.org/10.1023/A:1002619001915
  18. A. Janusz, A. Krasuski, S. Stawicki, M. Rosiak, D. Ślęzak, and H. S. Nguyen, “Key risk factors for Polish state fire service: a data mining competition at knowledge pit,” in Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. P. M. Ganzha, L. Maciaszek, Ed., vol. 2. IEEE, 2014. http://dx.doi.org/10.15439/2014F507 pp. 345–354. [Online]. Available: http://dx.doi.org/10.15439/2014F507
  19. A. Janusz, M. Przyborowski, P. Biczyk, and D. Ślęzak, “Network device workload prediction: A data mining challenge at knowledge pit,” in Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, S. Agarwal, D. N. Barrell, and V. K. Solanki, Eds., vol. 21. IEEE, 2020. http://dx.doi.org/10.15439/2020KM159 pp. 77–80. [Online]. Available: http://dx.doi.org/10.15439/2020KM159
  20. A. Janusz, A. Jamiołkowski, and M. Okulewicz, “Predicting the costs of forwarding contracts: Analysis of data mining competition results,” in Proceedings of the 17th Conference on Computer Science and Intelligence Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Ślęzak, Eds., vol. 30. IEEE, 2022. http://dx.doi.org/10.15439/2022F303 p. 399–402. [Online]. Available: http://dx.doi.org/10.15439/2022F303
  21. M. Sowański and A. Janicki, “Leyzer: A dataset for multilingual virtual assistants,” in Text, Speech, and Dialogue, P. Sojka, I. Kopeček, K. Pala, and A. Horák, Eds. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58323-1_51. ISBN 978-3-030-58323-1 pp. 477–486.
  22. C. Park, J. Seo, S. Lee, C. Lee, H. Moon, S. Eo, and H. Lim, “BTS: Back TranScription for speech-to-text post-processor using text-to-speech-to-text,” in Proceedings of the 8th Workshop on Asian Translation (WAT2021). Online: Association for Computational Linguistics, Aug. 2021. http://dx.doi.org/10.18653/v1/2021.wat-1.10 pp. 106–116. [Online]. Available: https://aclanthology.org/2021.wat-1.10
  23. J. Guo, T. N. Sainath, and R. J. Weiss, “A spelling correction model for end-to-end speech recognition,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. http://dx.doi.org/10.1109/ICASSP.2019.8683745 pp. 5651–5655.
  24. Y. Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T. Liu, “FastSpeech 2: Fast and high-quality end-to-end text to speech,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [Online]. Available: https://openreview.net/forum?id=piLPYqxtWuA
  25. J. Kim, J. Kong, and J. Son, “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18–24 Jul. 2021, pp. 5530–5540. [Online]. Available: https://proceedings.mlr.press/v139/kim21f.html
  26. J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, R. A. Saurous, Y. Agiomvrgiannakis, and Y. Wu, “Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. http://dx.doi.org/10.1109/ICASSP.2018.8461368 pp. 4779–4783.
  27. G. Eren and The Coqui TTS Team, “Coqui TTS,” Jan. 2021. [Online]. Available: https://github.com/coqui-ai/TTS
  28. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. Mcleavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” in Proceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, Eds., vol. 202. PMLR, 23–29 Jul 2023, pp. 28 492–28 518. [Online]. Available: https://proceedings.mlr.press/v202/radford23a.html
  29. A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsupervised cross-lingual representation learning at scale,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, Jul. 2020. http://dx.doi.org/10.18653/v1/2020.acl-main.747 pp. 8440–8451. [Online]. Available: https://aclanthology.org/2020.acl-main.747
  30. T. Kudo and J. Richardson, “SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Brussels, Belgium: Association for Computational Linguistics, Nov. 2018. http://dx.doi.org/10.18653/v1/D18-2012 pp. 66–71. [Online]. Available: https://aclanthology.org/D18-2012
  31. F. Graliński, R. Jaworski, Ł. Borchmann, and P. Wierzchoń, “Gonito.net - Open Platform for Research Competition, Cooperation and Reproducibility,” in Branco, António and Nicoletta Calzolari and Khalid Choukri (eds.), Proceedings of the 4REAL Workshop: Workshop on Research Results Reproducibility and Resources Citation in Science and Technology of Language, 2016, pp. 13–20. [Online]. Available: http://4real.di.fc.ul.pt/wp-content/uploads/2016/04/4REALWorkshopProceedings.pdf
  32. F. Graliński, A. Wróblewska, T. Stanisławek, K. Grabowski, and T. Górecki, “GEval: Tool for debugging NLP datasets and models,” in Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Florence, Italy: Association for Computational Linguistics, Aug. 2019. http://dx.doi.org/10.18653/v1/W19-4826 pp. 254–262. [Online]. Available: https://www.aclweb.org/anthology/W19-4826
  33. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017. ISBN 9781510860964 p. 6000–6010.
  34. H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, A. Chowdhery, A. Castro-Ros, M. Pellat, K. Robinson, D. Valter, S. Narang, G. Mishra, A. Yu, V. Zhao, Y. Huang, A. Dai, H. Yu, S. Petrov, E. H. Chi, J. Dean, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, and J. Wei, “Scaling instruction-finetuned language models,” 2022.
  35. S. Jadczak and R. Jaworski, “Boosting conversational AI correctness by accounting for ASR errors using a sequence to sequence model,” in Proceedings of the 18th Conference on Computer Science and Intelligence Systems, 2023.
  36. Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, and L. Zettlemoyer, “Multilingual denoising pre-training for neural machine translation,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 726–742, 2020. http://dx.doi.org/10.1162/tacl_a_00343. [Online]. Available: https://aclanthology.org/2020.tacl-1.47