Reinforcement Learning for on-line Sequence Transformation

Grzegorz Rypeść; Łukasz Lepak; Paweł Wawrzyński

Reinforcement Learning for on-line Sequence Transformation

Grzegorz Rypeść, Łukasz Lepak, Paweł Wawrzyński

DOI: http://dx.doi.org/10.15439/2022F70

Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 133–139 (2022)

Full text

Abstract. In simultaneous machine translation (SMT), an output sequence should be produced as soon as possible, without reading the whole input sequence. This requirement creates a trade-off between translation delay and quality because less context may be known during translation. In most SMT methods, this trade-off is controlled with parameters whose values need to be tuned. In this paper, we introduce an SMT system that learns with reinforcement and is able to find the optimal delay in training. We conduct experiments on Tatoeba and IWSLT2014 datasets against state-of-the-art translation architectures. Our method achieves comparable results on the former dataset, with better results on long sentences and worse but comparable results on the latter dataset.

References

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. The MIT Press, 2018. https://dx.doi.org/10.1109/TNN.1998.712192
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” 2014, https://arxiv.org/abs/1409.3215. https://dx.doi.org/10.48550/arXiv.1409.3215
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” 2015. https://dx.doi.org/10.48550/arXiv.1409.0473
T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Conference on Empirical Methods in Natural Language Processing, 2015. https://dx.doi.org/10.18653/v1/D15-1166 pp. 1412–1421.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017.
Z. Wang, Y. Ma, Z. Liu, and J. Tang, “R-transformer: Recurrent neural network enhanced transformer,” 2019, https://arxiv.org/abs/1907.05572. https://dx.doi.org/10.48550/arXiv.1907.05572
S. Kapturowski, G. Ostrovski, J. Quan, R. Munos, and W. Dabney, “Recurrent experience replay in distributed reinforcement learning,” in International Conference on Learning Representations, 2019.
M. Hausknecht and P. Stone, “Deep recurrent q-learning for partially observable mdps,” 2015, https://arxiv.org/abs/1507.06527. https://dx.doi.org/10.48550/arXiv.1507.06527
L. Wu, F. Tian, T. Qin, J. Lai, and T.-Y. Liu, “A study of reinforcement learning for neural machine translation,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018. https://dx.doi.org/10.18653/v1/D18-1397 pp. 3612–3621.
L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generative adversarial nets with policy gradient,” in AAAI, 2017.
G. L. Guimaraes, B. Sanchez-Lengeling, P. L. C. Farias, and A. Aspuru-Guzik, “Objective-reinforced generative adversarial networks (organ) for sequence generation models,” 2017, https://arxiv.org/abs/1705.10843. https://dx.doi.org/10.48550/arXiv.1705.10843
A. Grissom II, H. He, J. Boyd-Graber, J. Morgan, and H. Daumé III, “Don’t until the final verb wait: Reinforcement learning for simultaneous machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. https://dx.doi.org/10.3115/v1/D14-1140 pp. 1342–1352.
J. Gu, G. Neubig, K. Cho, and V. O. Li, “Learning to translate in real-time with neural machine translation,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 2017. https://dx.doi.org/10.18653/v1/E17-1099 pp. 1053–1062.
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in 40th annual meeting on association for computational linguistics, 2002. https://dx.doi.org/10.3115/1073083.1073135 pp. 311–318.
A. Alinejad, M. Siahbani, and A. Sarkar, “Prediction improves simultaneous neural machine translation,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018. https://dx.doi.org/10.18653/v1/D18-1337 pp. 3022–3027.
F. Dalvi, N. Durrani, H. Sajjad, and S. Vogel, “Incremental decoding and training methods for simultaneous translation in neural machine translation,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018. https://dx.doi.org/10.18653/v1/N18-2079 pp. 493–499.
B. Zheng, R. Zheng, M. Ma, and L. Huang, “Simpler and faster learning of adaptive policies for simultaneous translation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019. https://dx.doi.org/10.18653/v1/D19-1137 pp. 1349–1354.
J. Ive, A. M. Li, Y. Miao, O. Caglayan, P. Madhyastha, and L. Specia, “Exploiting multimodal reinforcement learning for simultaneous machine translation,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021. https://dx.doi.org/10.18653/v1/2021.eacl-main.281 pp. 3222–3233.
M. Ma, L. Huang, H. Xiong, R. Zheng, K. Liu, B. Zheng, C. Zhang, Z. He, H. Liu, X. Li, H. Wu, and H. Wang, “STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. https://dx.doi.org/10.18653/v1/P19-1289 pp. 3025–3036.
Y. Ren, J. Liu, X. Tan, C. Zhang, T. Qin, Z. Zhao, and T.-Y. Liu, “SimulSpeech: End-to-end simultaneous speech to text translation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. https://dx.doi.org/10.18653/v1/2020.acl-main.350 pp. 3787–3796.
Tatoeba, “https://tatoeba.org, 2020, retrieved 2020-05-05.
M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli, “fairseq: A fast, extensible toolkit for sequence modeling,” in Proceedings of NAACL-HLT 2019: Demonstrations, 2019. https://dx.doi.org/10.18653/v1/N19-4009
S. Wiseman and A. M. Rush, “Sequence-to-sequence learning as beam-search optimization,” arXiv preprint https://arxiv.org/abs/1606.02960, 2016. https://dx.doi.org/10.48550/arXiv.1606.02960
M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spaCy: Industrial-strength Natural Language Processing in Python,” 2020. https://dx.doi.org/10.5281/zenodo.1212303
R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” arXiv preprint https://arxiv.org/abs/1508.07909, 2015. https://dx.doi.org/10.48550/arXiv.1508.07909