Lessons Learned from Evaluating MT Engines at eBay


Track: Automation | AU5 |   Intermediate |
Wednesday, July 29, 2020, 2:15pm – 2:45pm
Held in: Stream 2
Presenters:
Luke Niederer - eBay 
Angelique Tesar - eBay

In this session we will focus on the following points:

DO collect feedback from evaluators — DON’T base your decisions solely on subjective feedback needs to be verified against empiric results.
DO evaluate the quality and benchmark engines — DON’T mix the two in one single task.
DO choose evaluators who are working with your content — DON’T limit yourself to two evaluators to speed up things, three is the minimum to avoid bias.
DO look for patterns of overediting — DON’T pressure the evaluators to accept low quality machine translation (MT) when the content is demanding style-wise (UI).
DO improve your engine’s vocabulary/terminology by adding new data — DON’T waste time on isolating terminology and building glossaries, this will be fixed by retraining with new data in context.

Takeaways: Attendees will learn best practices for MT evaluations and helpful ways to improve MT qualities.