Multi-score neural machine translation evaluation : beyond single score metrics
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Evaluating translation quality in the domain of machine translation (MT) has traditionally relied on single-score metrics. This paper introduces a novel multi-score evaluation approach that generates a three-dimensional vector, distinctly measuring quality across three dimensions: accuracy, fluency, and style. Through extensive empirical investigations, we underscore the benefits of our approach, showcasing both its interpretability and its superior edge against traditional single-score methods. Among the various architectures tested, RemBERT emerged as the most promising, consistently delivering robust results in both reference-based MT evaluation and reference-free quality estimation. Our approach offers a versatile solution, equipping stakeholders-from MT developers to end-users-with nuanced insights that go beyond a simple overarching assessment of translation quality.