FAQ

How and from where were the participants in these datasets recruited?

We source our contributors using several different models, from organic to paid acquisition strategies depending on segmentation needs and targets markets. Using a multi-channel approach on both of them, we‘re leveraging self-owned channels, 3rd party advertisement platforms, and partnerships with local entities. This approach allow us to target different audiences based on their demographics, skill sets, experience, audience size, language, device, interests and real-time context targeting (e.g. Audio Ads, Video Ads, Image Ads, Search Ads, Non-Profit Organizations, Student Networks, Blogging, Organic Posts, etc.)

How do we inform the dataset participants about how the data collected will be used?

All the contributors that join our platform are promptly exposed to our Terms of Use (ToU), our Privacy Policy (PP), Cookies Policy (CP), which they need to give their consent before start using our platform. In the PP, we have a specific section where we explain in detail which information we collect, how we collect it and its use application. In anytime and without limitations, any contributor in our platform can delete his account, and all his personal information collected will be automatically anonymised.

We're GDPR compliant and ISO- 27001 certified, which give all our contributors a layer of data security when using our platform.

More specifically, when a member enters a job for the first time they also exposed to a second and job-specific NDA and MSA. We explicitly ask all participants not to share any PII during the data collections. If the scenario requires talking about PII-type date like names, account information details, phone numbers and the like, we instruct the participant to not use actual data but to make up values that would adhere to the structure of the piece of information only (right amount of numbers in a social security number, for instance).

How do you determine pay rates for your participants in various locales?

We have a fair pay policy, ensuring at least minimum wage payment, and in some countries and locales we are required to pay “living wages”. Pay rate is determined based on ability to attract contributors to complete tasks. If the job requires a higher skill set (e.g. doing a medical collection), we may have to pay more for a specialised group of contributors.

What are the terms of the Data License?

All Defined.ai datasets are covered by our standard license agreement you can find here. The license agreement is perpetual and allows for the commercialization of all models built on the data.

What is Spontaneous IVR data and how it is gathered?

IVR stands for Interactive Voice Response. The Spontaneous IVR data is created by having a human respond to an IVR system. The human (playing a customer) is following simplified real-life scenarios, making a spontaneous query on the given topic. The IVR system will ask the human to repeat his/her query in 2 more different ways. The speech is then transcribed. In rare cases, the IVR system is mimicked by another human. The recording is done via telephony and is saved in 8khz 16 bit per channel.

What is Spontaneous Dialog Data and how it is gathered?

We create scenarios for our crowd members to follow, which they study beforehand. They then record a conversation, one crowd member playing the agent, the other the crowd member “playing out” the scenario with spontaneous content. The recording is done via telephony and is saved in 8khz 16 bit per channel. That content is then transcribed.

What is Scripted Monologue data and how it is gathered?

The speakers are presented with a prompt (script) and asked to read it out loud and record. Our clients will receive an audio recording, the prompt and information about the speaker. The audio is recorded on-device, typically in 16Khz 16 bit. We also provide information on which device each record was recorded.

If I buy 200h of data, does it mean I will get 200h of pure speech?

The deliverables are measured in audio duration. For scripted speech, there will be some silence before and after reading the prompt, to make sure the speech is not cut off, and will also be influenced by how soon the participant hit the 'stop recording' button. For Dialogue speech, the conversations are natural and generally have little silence, except for natural breaks and pauses in speech. For IVR, as there is only 1 channel with human speech, the speech segments are about 50% of the audio duration.

Can I get a sample of a dataset?

Sure, you can instantly download free samples from the website.

Can you package subsets of data for me according to specific requirements of age, gender and accent?

Yes, we can do that. Simply tell us your specific requirements and we will package your custom dataset for you.

I need data that is not listed on the marketplace. Can you help me with my request?

We would be happy to help. You can either order a custom collection, or you can wait until this data is available on the marketplace. Contact us to learn about what is planned for the quarter – it just may be what you are looking for.

What are the payment options?

We accept USD via ACH bank transfer. If you need a purchase order, SOW, or other documentation, you can always talk to our team.

When will my purchased assets be delivered?

For standard buying options, we deliver the datasets as soon as we receive your payment. ACH bank transfer orders will be delivered once funds are cleared. This generally takes 2-3 business days. Once funds have cleared, the assets are released. If you are requesting for a custom buying option, it might take longer time for us to package them for you.

Are there specific terms for Academia?

Yes, we offer datasets for Academia with significant discounts or even for free! After conducting a certain due diligence we will give you a promocode that you can apply on the website. Contact us for more information.

Do you offer discounts?

Yes, we do offer discounts depending on the volume of data that you purchase. Please contact us to get a quotation.

Can I get a refund?

Unfortunately, we do not offer refunds. To help you decide before the purchase we provide samples of the data. The structure of the sample is identical to the whole dataset, apart for the metadata. The metadata completeness can be different for different parts of the dataset. If you find issues with the dataset, please let us know. We will assess the issue and work on a solution.

What is the process and terms of selling my data on the Marketplace?

If you own quality AI training data, we would be happy to help you sell it on our marketplace. Please contact our partnership team to talk more about this. Please be prepared that we ask about and test the quality of your data.

Still have questions?

Let us know, and we will get back to you shortly
All fields are required

By submitting your contact request, you are agreeing with Defined.ai Privacy Policy.

Browse datasets:

Spanish Spontaneous Dialogue

859 hours recorded by speakers from Spain, Mexico, and the US
Banking
Insurance
Retail
+3
DAI logo
Defined.ai hosts the leading online marketplace for buying and selling AI data, tools and models, and offers professional services to help deliver success in complex machine learning projects. Defined.ai is a community of AI professionals building fair, accessible and ethical AI of the future.
Datasets
Contact
1201 3rd Avenue, STE 2200, Seattle WA
[email protected]
Wired logo
Forbes 2019 AI50 logo
CB insights logo
Forbes 2020 logo
Inc. 5000 logo
PME logo

© 2023 DefinedCrowd. All rights reserved.