Creating Various Test Corpora for ASR Bias Detection

Creating Various Test Corpora for ASR Bias Detection

Speech

Learn how a leading speech recognition technology company improved its ASR bias detection customized test corpora

ASR bias detection is crucial to make speech recognition technology accessible to all users. In this article, we will explore how Defined.ai helped our client create various test corpora for ASR bias detection, using specific demographic distributions of age, gender, and accent for each of the relevant locales their models served.

The customer

Our customer was a leading speech recognition technology company that develops Automatic Speech Recognition (ASR) models serving multiple international locales and language speakers.

The context

Our client, having created ASR models for a number of languages, wanted to measure whether or not their models were biased. Specifically, they were interested in checking if their models showed signs of bias towards certain ages, genders, accents, or combinations thereof.

Being able to assess bias is the first step in fixing any issues that might prevent speech recognition technologies from being accessible to all users

Being able to assess bias is the first step in fixing any issues that might prevent speech recognition technologies from being accessible to all users, a need which our client naturally felt strongly about. Their motivation arose from an intent to “do the right thing,” as well as to enhance and make their product as accessible as possible in a world where a focus on ethical AI has fortunately become a top priority. Publicized reports of bias can and will lead to a distrust of the technology and reputational harm, naturally resulting in a loss of clients and revenue.

The solution

Defined.ai worked with the client on the parameters of the datasets required for their ASR bias detection. Their requirements cited very specific demographic distributions of age, gender, and accent for each of the relevant locales their models served.

We then set to work to verify that the correct data was already available in our large repository of off-the-shelf datasets and, if necessary, which data had to be collected separately. In cooperation with the client, the end result was a highly curated dataset that passed additional quality checks beyond our already stringent quality standards to leave zero room for error, including the correctness of the speaker metadata.

Our client quickly identified ASR bias in their models with the use of the custom dataset This confirmed their need for specific training data to retrain their models, leading to more accessible service for customers in the many locales their models served. The client was able to ensure that their models were free from ASR bias and promote fairness and inclusivity in their product.

Creating-Various-Test-Corpora.png

Visit our Marketplace today to access our large repository of off-the-shelf datasets with all metadata details required either for ASR bias detection or to better accommodate your needs.

You may also like:

English Spontaneous Dialogue

3345 hours recorded by speakers from the UK, the US, Ireland, Australia, and India
Banking
Healthcare
Retail
+4
DAI logo
Defined.ai hosts the leading online marketplace for buying and selling AI data, tools and models, and offers professional services to help deliver success in complex machine learning projects. Defined.ai is a community of AI professionals building fair, accessible and ethical AI of the future.
Datasets
Contact
1201 3rd Avenue, STE 2200, Seattle WA
[email protected]
Wired logo
Forbes 2019 AI50 logo
CB insights logo
Forbes 2020 logo
Inc. 5000 logo
PME logo

© 2023 DefinedCrowd. All rights reserved.