Inclusive ASR Models: Using High-Quality, Ethical Data for Global Speech Recognition
Defined.ai's speech data reduces client's Automatic Speech Recognition Word Error Rate across 5 locales by up to 100%.
Defined.ai teamed up with a leading consumer electronics company to take their speech recognition models in five locales to the next level. This client’s existing Automatic Speech Recognition (ASR) solutions in these locales were not performing to their standards on spontaneous speech input, hitting a Word Error Rate (WER) of 5–8%. They also needed improvements when it came to the models’ bias for specific demographic groups, which can cause problems identifying different accents and regional dialects or lead to inaccurate transcriptions.
ASR is an integral part of this client’s product portfolio, so sub-par speech recognition solution was not an option. They quickly realized that the only sure-fire way to significantly improve their ASR models' WER was to re-train with data representative of their users. They also understood that unregulated ways of obtaining this data, like scraping the web, are a non-starter, for both ethical and business reasons. Bad publicity and legal action regarding illegally obtained data is rife these days, and our client wisely avoided this issue altogether by sourcing their data from the world's largest ethical AI data marketplace: Defined.ai.
Discover how our customer tackled these business challenges through the use of responsible, high-quality and diverse speech datasets to train and optimize their existing ASR models for impressive performance results.
Our Customer: Innovative consumer electronics worldwide
We worked with a department focusing on Natural Language Processing for an international consumer electronics conglomerate.
The Context: Spontaneous speech in AI ASR
After internal testing, our client realized that the performance of their Speech Recognition solutions in five key locales was no longer up to par, and falling behind the offerings of their competitors. More specifically, the recognition of spontaneous speech was falling behind, and there were biases in their models, showing worse performance for specific population groups compared to others.
The need was clear: our client needed ethically collected spontaneous speech, transcribed with high fidelity, and with proper distributions for participant age, gender and accent, and they needed it as soon as possible.
By training on Defined.ai’s speech datasets, our customer cut their ASR models' WER by three-quarters or more, dropping from 5–8% down to 0–2% in all five locales.
The Solution: Defined.ai's AI speech data services
After comparing their options, it was clear there was only one option: Defined.ai was able to provide the required large amounts of spontaneous speech data in all requested locales immediately, ticking all the right boxes—high quality, ethically collected, diverse and bias-free—and all at very competitive prices.
The results: Lower word error rates without the risks of open-source data
By training on Defined.ai’s speech datasets, our customer cut their ASR models' WER by three-quarters or more, dropping from 5–8% down to 0–2% in all five locales. And because all of the data was collected ethically, cleared for copyright and specifically consented for AI training, their AI tools are future-proofed by full GDPR and ISO compliance.
Learn how Defined.ai can help you lower your AI model's WER: speak to an expert!