Scam Alert: We’ve detected unauthorized use of the Defined.ai name.Read the notice

Become a partnerGet in touch
Get in touch
  • Browse Marketplace
  • Data Annotation

    Model-in-the-loop, expert-verified labeling for text, audio, image and video

    Machine Translation

    High-quality multilingual content for global AI systems

    Data Collection

    Global, diverse datasets for AI training at scale

    Conversational AI

    Natural, bias-free voice and chat experiences worldwide

    Data & Model Evaluation

    Rigorous testing to ensure accuracy, fairness and quality

    Accelerat.ai

    Smarter multilingual AI agent support for global businesses


    Industries

Find the right datasets for you

Suggested filters

Healthcareimage

Dataset title

Domain

Type

Locale

Amount

American English Scripted Monologue, technical

8 hours of American English short read phrases, typically around 5 seconds in duration, recorded on mobile devices in the technical domain.

Tech

EN,

en-US

8 hours

German Scripted Monologue, technical

8 hours of German short read phrases, typically around 5 seconds in duration, recorded on mobile devices in the technical domain.

Tech

DE,

de-DE

8 hours

European French Scripted Monologue, technical

9 hours of European French short read phrases, typically around 5 seconds in duration, recorded on mobile devices in the technical domain.

Tech

FR,

fr-FR

9 hours

Code Instruction Dataset — 17,000 Human-Reviewed Prompt & Response Pairs for LLM Fine-Tuning

Code instruction dataset with more than 17,000 prompt and response pairs in a variety of coding languages.

Tech

17K pairs

Code Repository Dataset — 110 Real-World Codebases for LLM Fine-Tuning

A code repository dataset with 110 repos from commercial software companies in major programming languages.

Coding
Tech

110 repos

Showing 5 of 5 datasets

Datasets per page

American English Scripted Monologue, technical

Domain:

Tech

Amount:

8 hours

Locale:

EN, en-US

German Scripted Monologue, technical

Amount:

8 hours

Locale:

DE, de-DE

European French Scripted Monologue, technical

Amount:

9 hours

Locale:

FR, fr-FR

Code Instruction Dataset — 17,000 Human-Reviewed Prompt & Response Pairs for LLM Fine-Tuning

Amount:

17K pairs

Code Repository Dataset — 110 Real-World Codebases for LLM Fine-Tuning

Amount:

110 repos

Showing 5 of 5 datasets

1/1

New datasets

Medical Claims Data for AI Model Training

Healthcare

Longitudinal Data in Oncology for AI Model Development

Healthcare

Wearable Health Data for AI Model Training

Healthcare

Hot datasets

Live Spanish Call Center Audio Dataset

Call Center

DICOM Medical Imaging Dataset with Clinical Reports

Healthcare

Multimodal Dataset for Household Robotics

Robotics
3D and Lidar

Couldn’t find the right dataset for you?

Get in touch

© 2026 DefinedCrowd. All rights reserved.

Award logo
Award logo
Award logo
Award logo
Award logo
Award logo

Datasets

Marketplace

Dataset Types

Privacy and Cookie PolicyTerms & Conditions (T&M)Data License AgreementSupplier ProgramCCPA Privacy StatementWhistleblowing ChannelCandidate Privacy Statement

© 2026 DefinedCrowd. All rights reserved.

Award logo
Award logo
Award logo
Award logo
Award logo
Award logo