Speech Recognition, Globalized! Improving In-app Speech Recognition Systems for Nonnative Speakers

Track: Technical | T5 |   Intermediate |
Thursday, June 13, 2019, 9:00am – 9:45am
Held in: Room C1-4
Nishant Rai - Adobe
Host: Gary Lefman

Speech processing algorithms perform efficiently on native English speakers, because people can access relatively large datasets with natives’ speech samples. A speech accent remains a challenge while building speech recognition systems and they struggle with the lack of data for nonnative accents. Globally available automatic speech recognition (ASR) systems lack the contextual training on terms specific to an organization’s domain. By building our own speech recognition system we can do the in-context training by handling domain-specific terms and nonnative accents adaptation. It will improve the overall ASR system, on organization’s specific terms and commands. Attendees will see different pronunciation variations from different speakers (native and nonnative) and how other ASR systems would recognize them incorrectly due to lack of context training. We will present our research on statistically analyzing various accents, automatically extracting phonological generalizations and use the trained model to generate accented versions of words to improve ASR for nonnative accents.

Takeaways: Attendees will understand how voice recognition works, especially in context with a multilingual approach. Organizations that are building an ASR system will want to recognize nonnative English speech and handle most probable mispronunciations (words with silent characters). Our solution will help them train their ASR model in the organization’s context to improve accuracy.