How Defined.ai and Theseus AI are Bridging the Startup Data Gap with Ethical AI

23 May 2024

Speech Recognition

In the world of AI development, the quality and specificity of training data can make or break a model's performance. While large tech giants have deep pockets to acquire premium datasets, startups and smaller companies often face significant financial barriers in challenges in AI data access. This "data wealth inequality" can hinder innovation and limit the potential impact of groundbreaking AI solutions.

At Defined.ai, we believe that ethical AI extends beyond fair wages and consent — it's also about democratizing AI data. Our recent collaboration with Theseus AI, facilitated by RunPod, exemplifies how bridging the data gap can unlock new realms of possibility. To understand our commitment to ethical AI, read our Ethical AI Manifesto.

The Challenge: Achieving Precision in Finance

Theseus AI specializes in advanced Automatic Speech Recognition (ASR) tailored specifically for the high-stakes world of finance. Their technology is designed to accurately convert spoken language into text, handling complex financial terminology across multiple languages with high precision. This capability is crucial for the financial sector, where accuracy in interpreting and transcribing conversations can directly influence business outcomes and compliance standards. By partnering with Defined.ai, Theseus AI was able to access specialized datasets that significantly enhanced the performance of their speech recognition models, ensuring that they remain at the forefront of AI-driven financial services.

As Theseus AI's team explained, "83% of people who have been using note-taking software in our sector have a problem: they have to go over the (mis-)transcription and spend a lot of time correcting the source. Financial Services is about numbers, therefore perfection."

The Solution: Ethically Sourced, Sector-Specific Data

Recognizing the limitations of generalist models trained on broad data, Theseus AI turned to Defined.ai for a solution. RunPod played a pivotal role in facilitating this transformative collaboration by providing a cost-effective and scalable cloud platform equipped with high-performance GPUs. This setup enabled Theseus AI to access Defined.ai’s specialized datasets hosted on RunPod's servers for a limited period. This arrangement was a departure from Defined.ai’s typical practice of permanently delivering data, offering cost effective access to premium training data that would otherwise be inaccessible to Theseus at this stage. RunPod’s robust infrastructure not only supported the heavy computational demands of training sophisticated AI models but also simplified the technological complexity, allowing Theseus AI to concentrate on refining and advancing their ASR solutions.

As Theseus AI shared,

We partnered with Defined.ai, which helped us get access to financial-specific audio database, in French and English, with more than 400 hours of audio annotated data.

The Results: Unprecedented Accuracy and Efficiency

By fine-tuning OpenAI's Whisper model on Defined.ai's high-quality financial data, Theseus AI achieved remarkable results. Their Word Error Rate (WER) plummeted from a staggering 18% to an impressive 1.7% on the validation dataset. In their own words, "Our results are really promising, Whisper-large-v3 starts with a WER (Word Error Rate) of 18% on the validation dataset. huge compared to regular dataset WER with generalists open source French datasets (close to 4.5% WER). We fine-tuned it and later reached circa 1.7% WER on financial-specific data."

The Catalyst: Seamless Collaboration with RunPod

Theseus AI's success wouldn't have been possible without the seamless collaboration facilitated by RunPod. Their user-friendly platform and top-tier GPU resources streamlined the development process, allowing Theseus AI to focus on innovation.

As they expressed, "RunPod was a no-brainer for us. Their cloud setup, equipped with top-tier GPUs, was easy to use and scale. It felt like an extension of our own tech, removing complexities and letting us focus on innovation."

The Impact: Democratizing AI Innovation

This collaboration showcases the transformative power of partnership and the importance of democratizing AI data. By making Defined.ai's premium datasets affordable through their collaboration with RunPod, they empowered Theseus AI to compete in the AI arena and push the boundaries of what's possible.

Theseus AI's gratitude was evident:

I want to thank Defined.ai, through their collaboration with RunPod.io, for helping us improve our model accuracy. Off-the-shelf data like Defined.ai's is often not within our budget, so it was a game changer for them to make it affordable via their collaboration with RunPod.

The Future: Continuous Innovation and Accessibility

Initiatives like this not only accelerate AI development but also pave the way for a more inclusive and equitable AI ecosystem. By breaking down financial barriers and fostering collaboration, we can unleash the full potential of innovative startups and drive transformative advancements across industries.

As Theseus AI aptly stated, "This journey proves when startups get access to top-notch tools and data, innovation doesn't just speed up; it leaps."

At Defined.ai, we remain committed to our mission of providing ethically sourced, high-quality datasets while actively addressing the data wealth gap. By partnering with platforms like RunPod, we aim to empower developers, researchers, and entrepreneurs worldwide, fostering a future where AI innovation knows no boundaries.

Ready to transform your AI capabilities with ethical, high-quality data? Contact Defined.ai today to discover how our data solutions can empower your startup to lead in innovation and efficiency.


© 2025 DefinedCrowd. All rights reserved.