Scam Alert: We’ve detected unauthorized use of the Defined.ai name.Read the notice

Become a partnerGet in touch
Get in touch
  • Browse Marketplace
  • Data Annotation

    Model-in-the-loop, expert-verified labeling for text, audio, image and video

    Machine Translation

    High-quality multilingual content for global AI systems

    Data Collection

    Global, diverse datasets for AI training at scale

    Conversational AI

    Natural, bias-free voice and chat experiences worldwide

    Data & Model Evaluation

    Rigorous testing to ensure accuracy, fairness and quality

    Accelerat.ai

    Smarter multilingual AI agent support for global businesses


    Industries

Find the right datasets for you

Suggested filters

Healthcareimage

Dataset title

Domain

Type

Locale

Amount

Korean Question-Answer pairs

1,250,480 Question-Answer pairs in Korean.

Academic
General

KO,

ko-KR

1.3M

Irish English Spontaneous Dialogue, generic

7 hours of Irish English simulated call center conversations between an agent and a client, recorded over telephony in the generic domain.

General
Conversational

EN,

en-IE

7 hours

UK English Spontaneous Dialogue, generic

195 hours of UK English simulated call center conversations between an agent and a client, recorded over telephony in the generic domain.

General
Conversational

EN,

en-GB

195 hours

American Spanish Spontaneous Dialogue, generic

104 hours of American Spanish simulated call center conversations between an agent and a client, recorded over telephony in the generic domain.

General
Conversational

es-US

104 hours

American English Spontaneous Dialogue, generic

199 hours of American English simulated call center conversations between an agent and a client, recorded over telephony in the generic domain.

General
Conversational

EN,

en-US

199 hours

Canadian French Scripted Monologue, generic

2 hours of Canadian French short read phrases, typically around 5 seconds in duration, recorded on mobile devices in the generic domain.

General

FR,

fr-CA

2 hours

American English Scripted Monologue, generic

40 hours of American English short read phrases, typically around 5 seconds in duration, recorded on mobile devices in the generic domain by native speakers of Spanish.

General

EN,

en-US

40 hours

European Portuguese Scripted Monologue, generic

1374 hours of European Portuguese short read phrases, typically around 5 seconds in duration, recorded on mobile devices in the generic domain.

General

pt-PT

1.4K hours

Egyptian Arabic Scripted Monologue, generic

2274 hours of Egyptian Arabic short read phrases, typically around 5 seconds in duration, recorded on mobile devices in the generic domain.

General

AR,

ar-EG

2.3K hours

Polish Scripted Monologue, generic

55 hours of Polish short read phrases, typically around 5 seconds in duration, recorded on mobile devices in the generic domain.

General

pl-PL

55 hours

Showing 10 of 42 datasets

Datasets per page

Korean Question-Answer pairs

Domain:

Academic
General

Amount:

1.3M

Locale:

KO, ko-KR

Irish English Spontaneous Dialogue, generic

Domain:

General
Conversational

Amount:

7 hours

Locale:

EN, en-IE

UK English Spontaneous Dialogue, generic

Amount:

195 hours

Locale:

EN, en-GB

American Spanish Spontaneous Dialogue, generic

Amount:

104 hours

Locale:

es-US

American English Spontaneous Dialogue, generic

Amount:

199 hours

Locale:

EN, en-US

Canadian French Scripted Monologue, generic

Amount:

2 hours

Locale:

FR, fr-CA

American English Scripted Monologue, generic

Amount:

40 hours

Locale:

EN, en-US

European Portuguese Scripted Monologue, generic

Amount:

1.4K hours

Locale:

pt-PT

Egyptian Arabic Scripted Monologue, generic

Amount:

2.3K hours

Locale:

AR, ar-EG

Polish Scripted Monologue, generic

Amount:

55 hours

Locale:

pl-PL

Showing 10 of 42 datasets

1/5

New datasets

Medical Claims Data for AI Model Training

Healthcare

Longitudinal Data in Oncology for AI Model Development

Healthcare

Wearable Health Data for AI Model Training

Healthcare

Hot datasets

Live Spanish Call Center Audio Dataset

Call Center

DICOM Medical Imaging Dataset with Clinical Reports

Healthcare

Multimodal Dataset for Household Robotics

Robotics
3D and Lidar

Couldn’t find the right dataset for you?

Get in touch

© 2026 DefinedCrowd. All rights reserved.

Award logo
Award logo
Award logo
Award logo
Award logo
Award logo

Datasets

Marketplace

Dataset Types

Privacy and Cookie PolicyTerms & Conditions (T&M)Data License AgreementSupplier ProgramCCPA Privacy StatementWhistleblowing ChannelCandidate Privacy Statement

© 2026 DefinedCrowd. All rights reserved.

Award logo
Award logo
Award logo
Award logo
Award logo
Award logo