Lessons Learned from Evaluating MT Engines at eBay
Track: Automation | AU5 | Intermediate |
Wednesday, July 29, 2020, 5:15am – 5:45am
Held in: Stream 2
Presenters:
Luke Niederer - eBay
Angelique Tesar - eBay
In this session we will focus on the following points:
DO collect feedback from evaluators — DON’T base your decisions solely on subjective feedback needs to be verified against empiric results.
DO evaluate the quality and benchmark engines — DON’T mix the two in one single task.
DO choose evaluators who are working with your content — DON’T limit yourself to two evaluators to speed up things, three is the minimum to avoid bias.
DO look for patterns of overediting — DON’T pressure the evaluators to accept low quality machine translation (MT) when the content is demanding style-wise (UI).
DO improve your engine’s vocabulary/terminology by adding new data — DON’T waste time on isolating terminology and building glossaries, this will be fixed by retraining with new data in context.
Takeaways: Attendees will learn best practices for MT evaluations and helpful ways to improve MT qualities.