Using High-Quality, Ethical Speech Data to train ASR Models
Training Automatic Speech Recognition (ASR) Systems Using ethically collected, bias-free Speech Data
Defined.ai teamed up with a leading consumer electronics company to take their Speech Recognition models in 5 locales to the next level. This client’s existing ASR solutions in these locales were not performing to their standards on spontaneous speech input, and also needed improvements when it came to the models’ bias for specific demographic groups.
ASR being an integral part of this client’s product portfolio, a sub-par Speech Recognition solution was not an option. Our client realized quickly that the only sure-fire way to significantly improve the Word Error Rate of their ASR models was to re-train with data that is representative of the inputs the users of their applications will use. They also realized that unethical ways of obtaining this data, like scraping the web, is a non-starter, both for ethical and business reasons. Bad publicity and legal action regarding illegally obtained data is rife these days, and our client wisely avoided this issue altogether by sourcing their data from the largest ethical AI data marketplace available – Defined.ai.
Discover how our customer tackled these business challenges through the use of responsible, high-quality and diverse speech datasets to train and optimize their existing ASR models for impressive performance results.
Our Customer
Is a department from an international conglomerate, focusing on Natural Language Processing for consumer electronics.
The Context
After internal testing, our client realized that the performance of their Speech Recognition solutions in 5 key locales was no longer up to par, and falling behind the offerings of their competitors. More specifically, the recognition of spontaneous speech was falling behind, and there were biases in their models, showing worse performance for specific population groups compared to others. The need was clear: our client needed ethically collected spontaneous speech, transcribed with high fidelity, and with proper distributions for participant age, gender and accent, and they needed it as soon as possible.
The Solution
After comparing their options, it was clear there was only one option – Defined.ai was able to provide the required large amounts of spontaneous speech data in all requested locales immediately, ticking all the right boxes – high quality, ethically collected, diverse and bias-free – and all at very competitive prices.
By utilizing Defined.ai’s OTS Speech datasets and improving their ASR models, our customer was able to significantly reduce the Word Error Rate in all 5 locales.