Code Instruction Dataset — 17,000 Human-Reviewed Prompt & Response Pairs for LLM Fine-Tuning

This code instruction dataset is a collection of human-generated, expertly reviewed coding prompts and response pairs. This intermediate to advanced code generation training data for LLM fine-tuning includes a variety of programming languages, including Python, JavaScript, Java, C#, Swift, Go, C, C++ and TypeScript, covering tourism, finance, gaming, health and sports.

Tech

Dataset specs

Type

Text

File format

xls

Amount

17K pairs

Dataset SubtypeTech, codingDomainTechFile formatxls

Leverage

This coding QA dataset can be used as a reference base to train models that can automatically generate new, diverse code prompts in various programming languages.

Use cases

Fine-tune LLMs to perform structured problem-solving and logical reasoning in complex coding domains with this LLM code training data.
Use this code generation training data to build and fine-tune AI moderation models that assess the accuracy, clarity and appropriateness of code prompts and programming languages.

Do you need a specific dataset?

We understand the uniqueness of every project. That's why we offer customizable dataset solutions to match your specific requirements.