Real-time Direct Translation System for Tamil and Sinhala Languages

Rajpirathap Sakthithasan, Shellvacumar Sheeyam, Umasuthan Kanthasamy, Amalraj Chelvarajah

DOI: http://dx.doi.org/10.15439/2015F113

Citation: Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 5, pages 1437–1443 (2015)

Full text

Abstract. Language barriers in day to day communication are common in all countries. In Sri Lanka we have a rising need for translation for Sinhala and Tamil to reduce language barriers and the statistical machine translation approach is more suitable for the concerned languages. Statistical machine translation method is one of the most promising and efficient method to perform machine translation for Sri Lankan languages likes Sinhala and Tamil. Statistical approach is more suitable for structurally dissimilar pairs of languages and efficient solution for large text translation. Sinhala and Tamil have a similarity in grammar and statistical approach will help to obtain more accurate results. We have developed a Real-time bi-directional translation system for both Tamil to Sinhala and Sinhala to Tamil for this research. We have used the Sri Lankan parliament corpus to train the language model. We have critically evaluated the both systems with parameter optimizations and have obtained the most accurate and efficient system. We have also utilized the scoring techniques like BLEU [2, 8] & NIST

for the system evaluation and we have integrated the MERT technique to tune the decoder.

References

J. U. Liyanapathirana, A Statistical Approach to English and Sinhala Translation, BSc. Thesis, University of Colombo School of Computing, Sri Lanka, July.
R.Weerasinghe, A Statistical Machine Translation Approach to Sinhala- Tamil Language Translation.
C.Callison-Burch, C. Fordyce, P. Koehn, C. Monz and J. Schroeder, Meta-Evaluation of Machine Translation‖ , in Proc. 2nd Workshop on Statistical Machine Translation, 2007, p.136-158.
Franz Josef Och, Minimum Error Rate Training in Statistical Machine Translation‖, in Association for Computational Linguistics.
Doddington,G Automatic evaluation of machine translation quality using n-gram co-occurrence statistics". Proc. Human Language Technology Conference (HLT), 2002,p. 128—132
R. Weerasinghe, A. R. bootstrapping the lexicon building process for machine translation between ‘new’ languages. In Proceedings of the Association of Machine Translation in the Americas Conference (AMTA), 2002.
Och, F. J., Tillmann, C. and Ney, H. Improved alignment models for statistical machine translation. In Proceedings of the 4th Conference on Empirical Methods in Natural Language Processing (EMNLP), Maryland, 1999.
K. Papineni, S. Roukos, T. Ward and W. Zhu, Bleu: a method for automatic evaluation of machine translation, in Proc. 40th annual meeting on association for computational linguistics, 2002, 2002, pp. 311–318.
P. Koehn, F. J. Och and D.Marcu, Statistical Phrase-Based Translation, in Proc. Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics,2002, pp. 1–7.
Bernadette Varga, Alina Dia Trambitas-Miron, Andrei Roth, Anca Marginean, Radu Razvan Slavescu, Adrian Groza, LELA—A natural language processing system for Romanian tourism, in Proc. 4th International Workshop on Advances in Semantic Information Retrieval, 2014, pp. 281–288.
Franz Josef Och, Minimum Error Rate Training in Statistical Machine Translation, in Association for Computational Linguistics, 2003, pp. 160–167.
A. Birch, B. Cowan, C. Callison-Burch, M. Federico, N. Bertoldi, P. Koehn and H. Hoang, Moses: Open Source Toolkit for Statistical Machine Translation, in Proc. ACL 2007 Demo and Poster Sessions, 2007, pp. 177–180.