English Spontaneous Dialogue

Banking
Healthcare
Retail
English
Insurance
Telecommunication
Spontaneous Dialogue

About the Dataset

3345 hours

This audio dataset contains 3345 hours of English Speech Data in various domains, recorded by native speakers from the UK, the US, Ireland, Australia, and India.

There are 1309 hours of English (US) human-to-human audio with the following domain distribution per dataset:

  • 321.37 hours of Banking
  • 192.03 hours of Insurance
  • 51.32 hours of Retail
  • 45.68 hours of Telecommunication
  • 198.62 hours of Generic domain
  • 500.22 hours of Healthcare / Retail

There are 996 hours of English (UK) human-to-human audio with the following domain distribution per dataset:

  • 225.53 hours of Banking
  • 196.05 hours of Insurance
  • 189.67 hours of Retail
  • 203.67 hours of Telecommunication
  • 181.38 hours of Generic domain

There are 702 hours of English (India) human-to-human audio with the following domain distribution per dataset:

  • 173.55 hours of Banking
  • 187.08 hours of Insurance
  • 168.62 hours of Retail
  • 173.4 hours of Telecommunication

There are 304 hours of English (AU) human-to-human audio with the following domain distribution per dataset:

  • 86.03 hours of Banking
  • 76.35 hours of Insurance
  • 79.37 hours of Retail
  • 62.73 hours of Telecommunication

There are 34 hours of English (Ireland) human-to-human audio with the following domain distribution per dataset:

  • 28.3 hours of Insurance
  • 6.58 hours of Generic domain

Defined.ai creates scenarios for our crowd members to follow, which they study beforehand. They then record a conversation, one speaker playing the agent, the other speaker “playing out” the scenario with spontaneous content. The recording is done via telephony and is saved in 8khz 16 bit per channel. That content is then transcribed.

The dataset is covered by Defined.ai's standard license agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.

Other characteristics:

  • Audio format: WAV
  • Recording environment: noisy, silent
  • Bits per sample: 16
  • Communication band: broadband
  • Sample rate: 8Hz

Metadata Distribution

Australia

Spontaneous_En_Au_Gender.png Spontaneous_En_Au_Age.png Spontaneous_En_Au_Accents.png

Short Audio Samples

Download Free 30-minute Sample

All fields are required

By clicking on the appropriate button or by downloading, installing, accessing, and/or using the data sample, you are agreeing with Defined.ai Privacy Policy, Terms of Use, and Data License Agreement.

You might also be interested in these audio datasets:

Spanish Spontaneous Dialogue

859 hours recorded by speakers from Spain, Mexico, and the US
Banking
Insurance
Retail
+3
DAI logo
Defined.ai hosts the leading online marketplace for buying and selling AI data, tools and models, and offers professional services to help deliver success in complex machine learning projects. Defined.ai is a community of AI professionals building fair, accessible and ethical AI of the future.
Datasets
Contact
1201 3rd Avenue, STE 2200, Seattle WA
[email protected]
Wired logo
Forbes 2019 AI50 logo
CB insights logo
Forbes 2020 logo
Inc. 5000 logo
PME logo

© 2023 DefinedCrowd. All rights reserved.