Creating Various Test Corpora for ASR Bias Detection

Creating Various Test Corpora for ASR Bias Detection

Speech

Learn how a leading speech recognition technology company improved its ASR bias detection customized test corpora

ASR bias detection is crucial to make speech recognition technology accessible to all users. In this article, we will explore how Defined.ai helped our client create various test corpora for ASR bias detection, using specific demographic distributions of age, gender, and accent for each of the relevant locales their models served.

The customer

Our customer was a leading speech recognition technology company that develops Automatic Speech Recognition (ASR) models serving multiple international locales and language speakers.

The context

Our client, having created ASR models for a number of languages, wanted to measure whether or not their models were biased. Specifically, they were interested in checking if their models showed signs of bias towards certain ages, genders, accents, or combinations thereof.

Being able to assess bias is the first step in fixing any issues that might prevent speech recognition technologies from being accessible to all users

Being able to assess bias is the first step in fixing any issues that might prevent speech recognition technologies from being accessible to all users, a need which our client naturally felt strongly about. Their motivation arose from an intent to “do the right thing,” as well as to enhance and make their product as accessible as possible in a world where a focus on ethical AI has fortunately become a top priority. Publicized reports of bias can and will lead to a distrust of the technology and reputational harm, naturally resulting in a loss of clients and revenue.

The solution

Defined.ai worked with the client on the parameters of the datasets required for their ASR bias detection. Their requirements cited very specific demographic distributions of age, gender, and accent for each of the relevant locales their models served.

We then set to work to verify that the correct data was already available in our large repository of off-the-shelf datasets and, if necessary, which data had to be collected separately. In cooperation with the client, the end result was a highly curated dataset that passed additional quality checks beyond our already stringent quality standards to leave zero room for error, including the correctness of the speaker metadata.

Our client quickly identified ASR bias in their models with the use of the custom dataset This confirmed their need for specific training data to retrain their models, leading to more accessible service for customers in the many locales their models served. The client was able to ensure that their models were free from ASR bias and promote fairness and inclusivity in their product.

Creating-Various-Test-Corpora.png

Visit our Marketplace today to access our large repository of off-the-shelf datasets with all metadata details required either for ASR bias detection or to better accommodate your needs.

You may also like:

Mastering Linguistic Diversity: Sourcing French Speech Training Data for Many Dialects

Mastering Linguistic Diversity: Sourcing French Speech Training Data f...

Learn how our our French Speech Training Data solutions helped Fortune 500 Tech develop in...
French
Retail
Telecommunication
+4

© 2025 DefinedCrowd. All rights reserved.