English Spontaneous Dialogue
About the Dataset
3345 hours
This audio dataset contains 3345 hours of English Speech Data in various domains, recorded by native speakers from the UK, the US, Ireland, Australia, and India.
There are 1309 hours of English (US) human-to-human audio with the following domain distribution per dataset:
- 321.37 hours of Banking
- 192.03 hours of Insurance
- 51.32 hours of Retail
- 45.68 hours of Telecommunication
- 198.62 hours of Generic domain
- 500.22 hours of Healthcare / Retail
There are 996 hours of English (UK) human-to-human audio with the following domain distribution per dataset:
- 225.53 hours of Banking
- 196.05 hours of Insurance
- 189.67 hours of Retail
- 203.67 hours of Telecommunication
- 181.38 hours of Generic domain
There are 702 hours of English (India) human-to-human audio with the following domain distribution per dataset:
- 173.55 hours of Banking
- 187.08 hours of Insurance
- 168.62 hours of Retail
- 173.4 hours of Telecommunication
There are 304 hours of English (AU) human-to-human audio with the following domain distribution per dataset:
- 86.03 hours of Banking
- 76.35 hours of Insurance
- 79.37 hours of Retail
- 62.73 hours of Telecommunication
There are 34 hours of English (Ireland) human-to-human audio with the following domain distribution per dataset:
- 28.3 hours of Insurance
- 6.58 hours of Generic domain
Defined.ai creates scenarios for our crowd members to follow, which they study beforehand. They then record a conversation, one speaker playing the agent, the other speaker “playing out” the scenario with spontaneous content. The recording is done via telephony and is saved in 8khz 16 bit per channel. That content is then transcribed.
The dataset is covered by Defined.ai's standard license agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.
Other characteristics:
- Audio format: WAV
- Recording environment: noisy, silent
- Bits per sample: 16
- Communication band: broadband
- Sample rate: 8Hz
Metadata Distribution
Australia
Short Audio Samples
- English (UK). Transcription for the sample is also available
- English (India). Transcription for the sample is also available
- English (AU). Transcription for the sample is also available
- English (Ireland). Transcription for the sample is also available
- English (US). Transcription for the sample is also available