The Role of Locally Relevant Content in Evaluating Multilingual LLMs


Track: Multilingual AI | AI2 |   Everyone |
Wednesday, October 15, 2025, 10:00am – 10:30am
Held in: Steinbeck 1
Presenters:
Nikolay Bogoychev - Meta 
Kriz Chan - Meta
Host: Adam Bittlingmayer

Evaluation of the multilingual capabilities of LLMs is a hard topic which receives surprisingly little attention in many prominent LLM releases. Even when LLMs are evaluated across multiple languages, the evaluation data often translated from English, meaning that the knowledge and skills they evaluate are English/Western-centric.

In this presentation, the presenters will discuss the challenges of multilingual evaluation of LLMs across cultures, and present an evaluation benchmark – MultiLoKo – to address these challenges and evaluate multilinguality across 31 languages of different families. MultiLoKo has distinct questions for each language that are locally sourced to ensure cultural relevance. It also contains translations from non-English data to English and the other way around. This allows us to study a wide range of questions related to multilinguality and localization in LLMs. Our benchmark, MultiLoKo, helps answer how multilingual widely used LLMs really are, and how LLM behavior varies based on the user language.

Key Takeaways:

Takeaways: How can we study culture-specific multilingual evaluation of LLMs and what are the main challenges? How multilingual really are widely-used LLMs in the industry? How adopting locally sourced data is important to pinpoint the weaknesses and strengths of LLMs across languages and cultures.