1991-2021: Thirty Years of Linguistic Casualties in Former Yugoslavia

Track: Multilingual AI | TA4 |
Wednesday, October 20, 2021, 1:30pm – 2:15pm
Held in: Jujama
Dimitra Kalantzi - lexiQA 
Jakov Miličević - Verbosari

The outbreak of the Yugoslav wars back in 1991 resulted in the official linguistic separation of Serbo-Croatian into four variants (Serbian, Croatian, Bosnian, Montenegrin) using two different scripts (Latin and Cyrillic). Since then, those new official languages, which often share common vocabulary, have not been evolving in parallel and have been sticking to different official standards. Thirty years later, controlling translation quality in these locales remains a challenge, both for machine and human translation. What is the best approach for selecting locale-specific training data, evaluating human resources, and delivering translations without mixing variants?

Takeaways: Attendees will get an overview of the differences between Serbian, Croatian, Bosnian, and Montenegrin in terms of grammar, vocabulary, and writing standards; learn about the impact of those differences on human and machine translation; and hear about methods that can be used to ensure consistent quality in these locales without overlapping and misleading data use.