Neural machine translation of dialectal-dialectal Arabic

Thumbnail Image

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis addresses the challenging task of neural machine translation (NMT) between various Arabic dialects, an area that has received limited focus in the field of natural language processing. The primary aim is to explore and compare different approaches to dialect-dialect translation, including models trained from scratch, fine-tuning pre-trained monolingual models, and fine-tuning pre-trained multilingual models. A comprehensive analysis was conducted to evaluate the effectiveness of an "Everything-to-Everything" model compared to models specifically trained for each translation direction. Additionally, the impact of systematically introducing additional data during the training phase, such as various dialects and Modern Standard Arabic (MSA), was examined. The performance of these models was evaluated using a range of automated metrics (such as BLEU and chrF++) and human evaluation of translation quality for a single target dialect. The study also investigates the correlation between machine translation performance and the mutual intelligibility among Arabic dialects based on a range of linguistic distance measures. The research reveals that fine-tuning a pre-trained monolingual model, AraT5, yields superior performance compared to other approaches, challenging common beliefs about multilingual models in low-resource scenarios. Furthermore, it was found that single-direction models outperform both the everything-to-everything model and the models that incorporated additional data. Moreover, lexical overlap on the type-level achieved higher correlation with the translation quality scores compared to other distance measures. Through human evaluation, the study validates the effectiveness of the developed models. The findings contribute significant insights into the intricacies of NMT between Arabic dialects, providing a foundation for future research in this field.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By