Abstract: Neural Machine Translation has surpassed many limitations of rule-based and statistical machine translation systems and is the current state-of-the-art. Though the success of Neural Machine Translation is indisputable, still many improvements are awaited when it comes to expecting the same level of quality for translation to/from low resource languages. In this work, we tried to develop a One-To-Many Multilingual Neural Machine Translation system, which is capable of translating text from English Language to two low resource Indic languages, viz., Assamese, Bengali. We used publicly available parallel corpus. Along with the public corpus, we also used synthetic data for Assamese as the target side. We got better results in terms of BLEU, chef and TER for English to Bengali and direction English to Assamese translation direction in multilingual settings as compared to their bilingual NMT counterparts. In this paper, we have shown that both multilingualism and use of synthetic data can enhance the translation quality of languages where gold standard parallel data is very low.

Keywords: Low resource language MNMT, Multilingual Neural Machine Translation, Indian languages MT, Indic NLP, Assamese NMT, Bengali NMT.

Cite:
Kishore Kashyap, Shikhar Kumar Sarma, "Multilingual NMT system for English to Low Resource Indic Languages - Assamese and Bengali", IJARCCE International Journal of Advanced Research in Computer and Communication Engineering, vol. 13, no. 3, 2024, Crossref https://doi.org/10.17148/IJARCCE.2024.13398.


PDF | DOI: 10.17148/IJARCCE.2024.13398

Open chat
Chat with IJARCCE