1991-2021: Thirty Years of Linguistic Casualties in Former Yugoslavia
Track: Multilingual AI | TA4 |
Wednesday, October 20, 2021, 1:30pm – 2:15pm
Held in: Jujama
Presenters:
Dimitra Kalantzi - lexiQA
Jakov Miličević - Verbosari
The outbreak of the Yugoslav wars back in 1991 resulted in the official linguistic separation of Serbo-Croatian into four variants (Serbian, Croatian, Bosnian, Montenegrin) using two different scripts (Latin and Cyrillic). Since then, those new official languages, which often share common vocabulary, have not been evolving in parallel and have been sticking to different official standards. Thirty years later, controlling translation quality in these locales remains a challenge, both for machine and human translation. What is the best approach for selecting locale-specific training data, evaluating human resources, and delivering translations without mixing variants?
Takeaways: Attendees will get an overview of the differences between Serbian, Croatian, Bosnian, and Montenegrin in terms of grammar, vocabulary, and writing standards; learn about the impact of those differences on human and machine translation; and hear about methods that can be used to ensure consistent quality in these locales without overlapping and misleading data use.