Audio datasets for AI training, evaluation and scale
Explore speech, voice and music datasets with the licensing, structure and quality signals your team needs to evaluate fit quickly.


2M+
500+
175+


Trusted by Leading AI Innovators

Data Collection
Sourcing
Choose audio datasets from multiple sourcing modes to reduce mismatch between training data and your AI solutions, from on-device scripted speech to call-center-style dialogue, live podcasts, music tracks and licensed SFX.
On-device scripted monologues for consistent, clean ASR and TTS training
Spontaneous dialogue/simulated call center audio for conversational AI and telephony-style speech recognition
Live, non-simulated podcasts recorded by real podcasters for long-form, real-world speech
Licensed music tracks for training music generation and classification models
Licensed sound effects for SFX generation and sound classification tasks

Featured audio datasets
Browse featured audio datasets ready to power speech, voice, and sound AI applications. Browse all audio datasets
Music - Instrumental
NO,
EL,
NE,
la,
si-lk,
brx-in,
nl-NL,
tl-PH,
or-in,
et-ee,
haz-af,
ca-es,
gjr-in,
ro-ro,
tcy-in,
bto-ph,
qaz-ir,
eu-es,
haw-us,
he-IL,
nb-NO,
ml-in,
ZH,
zh-CN,
en-AU,
dhd-in,
wuu-cn,
mni-in,
gl-es,
ahr-in,
pt-PT,
af-za,
hu-hu,
fi-fi,
gon-in,
bn-IN,
LV,
ar-LAV,
ar-SD,
KO,
MS,
AR,
pa-IN,
ta-IN,
te-IN,
es-ES,
it-IT,
en-GB,
fr-MX,
de-DE,
en-US,
en-IN,
fr-FR,
TH,
HE,
SO,
ZU,
TL,
SR,
EN,
DA,
VI,
mr-IN,
hi-IN,
ID,
pl-PL,
kn-IN,
FA,
UR,
fr-CA,
TR,
YUE,
es-MX,
CZ,
es-AR,
JA,
sv-SE,
DE,
RU,
FR,
pt-BR,
es-VE,
ar-MA,
ar-LB,
hi-US,
fr-MA,
ar-TN,
es-US,
ar-JO,
ar-JS,
ar-IQ,
ar-YE,
ar-DZ,
ar-AR,
de-US,
ar-EG,
ja-JP,
ar-SA,
ar-AE,
es-BO,
ar-TR,
ja-US,
es-PE,
ar-Kw,
es-EC,
es-LA,
es-CO,
es-CL,
fr-US,
en-CA,
ko-KR,
da-DK,
ru-RU,
nl-BE,
cs-CZ,
vi-VN,
gu-IN,
ar-MSA,
fa-IR,
en-IE,
is-is,
sk-sk,
lt-lt,
uk-ua,
rmn-ro,
cy-gb,
kxu-in,
sgs-lt,
la-latn
What you can build with audio datasets


Automatic Speech Recognition
Speech datasets and conversational datasets for ASR model training, fine-tuning and evaluation.


Healthcare
Licensed audio datasets for medical speech workflows, domain adaptation and speech-enabled patient or clinician tools.


Automotive
Voice datasets for embedded interfaces, wake-word systems, command recognition and in-cabin conversational AI.


Automatic Speech Recognition
Speech datasets and conversational datasets for ASR model training, fine-tuning and evaluation.


Healthcare
Licensed audio datasets for medical speech workflows, domain adaptation and speech-enabled patient or clinician tools.


Automotive
Voice datasets for embedded interfaces, wake-word systems, command recognition and in-cabin conversational AI.
Introducing the new and improved
Defined.ai Data Marketplace
The world’s largest marketplace of AI training data


Audio datasets FAQ
Audio datasets are curated collections of audio files plus metadata and often labels such as transcripts. They are used to train, fine-tune or evaluate AI models for speech, music and sound understanding.
A speech dataset typically targets ASR with transcripts and speech coverage, while a voice dataset is often used for identity-centric tasks like verification and speaker identification. Many audio dataset formats can support both depending on metadata and labels.
Yes. Defined.ai offers licensed datasets for AI training and evaluation, helping teams review usage suitability earlier in the buying process.
Yes. If you are comparing multiple options, Defined.ai can help shortlist datasets based on use case, channel, locale, quality requirements and labeling needs.
Timelines depend on scope, languages, sourcing and annotation requirements, but the team can help define a realistic collection path and shortlist existing options first.
Yes. Scripted speech datasets can support speech synthesis and TTS workflows, including higher-fidelity audio specs on some offerings.
Yes. Podcast datasets support long-form transcription, indexing and conversational modeling, and Defined.ai offers live podcasts with transcription available as a service.