Logo PTI Logo FedCSIS

Position and Communication Papers of the 16th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 26

Training of neural machine translation model to apply terminology constraints for language with robust inflection

DOI: http://dx.doi.org/10.15439/2021F147

Citation: Position and Communication Papers of the 16th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 26, pages 233234 ()

Full text

Abstract. The goal of this study is to explore the transformer's capability of domain translation into a morphologically rich language. Satisfactory translation into Polish requires inflection by tense, number, and person, taking into account six declination cases. The ideal outcome of this study would be to prove that the method proposed by Dinu is capable of training the transformer to translate English to Polish in domain-specific scenarios. Achieving metrics similar to Nowakowski would result in a ''zero-shot'' translator with a considerably higher translation speed

References

  1. Georgiana Dinu, Prashant Mathur, Marcello Federico, and Yaser Al-Onaizan. Training neural machine translation to apply terminology constraints. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3063–3068, Florence, Italy, July 2019. Association for Computational Linguistics.
  2. Jassem Nowakowski. Neural machine translation with inflected lexicon. 2021.
  3. Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search. CoRR, abs/1612.00576, 2016.
  4. Chris Hokamp and Qun Liu. Lexically constrained decoding for sequence generation using grid beam search. ArXiv, abs/1704.07138, 2017.
  5. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics.
  6. Miriam Exel, Bianka Buschbeck, Lauritz Brandt, and Simona Doneva. Terminology-constrained neural machine translation at SAP. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 271–280, Lisboa, Portugal, November 2020. European Association for Machine Translation.
  7. Gema Ramírez-Sánchez, Jaume Zaragoza-Bernabeu, Marta Bañón, and Sergio Ortiz Rojas. Bifixer and bicleaner: two open-source tools to clean your parallel data. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 291–298, Lisboa, Portugal, November 2020. European Association for Machine Translation.
  8. Robert Patterson. Compendium of accounting in polish english. 2015.
  9. Sho Takase and Shun Kiyono. Lessons on parameter sharing across layers in transformers. ArXiv, abs/2104.06022, 2021.