
Code Repositories for Agentic Code: Building Better Models
A leading AI model and software developer partnered with Defined.ai to transform production-grade code repositories into high-value training data for agentic code systems.
TL;DR
- Delivered 1,000+ production-grade, closed-source code repositories for LLM training and evaluation, complete with commit histories and related engineering context.
- Supplied the dataset via API in 5 days, beating the customer’s expected two-week turnaround for a model release cycle.
- Provided more than raw lines of code: the dataset included commit history and supporting** engineering artifacts** that help models learn not just what was built, but how and why it evolved.
- Covered multiple major programming languages and platforms, including C#, Java, Python and mobile and desktop development environments.
- Supported measurable gains in agentic code performance: across SWE-bench Verified, SWE-bench Pro, Terminal-Bench 2.0 and SkillsBench, the newer model version improved from an average of 48.8 to 59.6—a gain of 10.8 points or 22.2%.
Our Customer: Training a Next-generation Code LLM for Developers
Our customer is a large AI model builder focused on improving coding performance for developers using AI-assisted agentic development tools, terminal-based coding assistants and repository-aware workflows. The team needed authentic, high-quality code data reflecting how software is actually built in developer workflows—not amateur examples, classroom exercises or isolated code snippets.
Specifically, the customer was preparing a new model release and needed repository-scale training material that could help a code LLM reason over software structure, file changes, developer intent and iterative engineering decisions.
The Challenge: Building Agentic Code Systems Requires More Than Raw Source Files
Unlike coding LLMs, modern coding agents do not just complete functions by themselves. They navigate repositories, interpret change history, reason across files and act on developer intent. That means high-performing agentic code models need richer training inputs than flat code dumps.
The customer needed production-grade code repositories quickly to meet the timeline for a new model release. Speed mattered, but so did realism; the repositories had to represent deployed software used by real users. They also needed to include contextual signals such as commits and supporting development traces, which help a model understand the “why” behind code changes.
In practice, the challenge came down to three requirements:
- Production realism: the customer needed real, deployed software rather than synthetic or student-built examples.
- Repository context: the dataset had to include commit histories and surrounding engineering artifacts, not just the final code state.
- Release speed: the team needed the data in less than two weeks to support a fast-moving model launch.
Our Capabilities: Marketplace-ready Code Datasets from Authentic Engineering Contexts
We helped the customer move quickly by providing off-the-shelf marketplace access to high-quality software engineer code datasets, avoiding the need for a lengthy custom data collection or annotation cycle. The value was not just availability, but the type of data: repository-scale assets with structural and historical context that better reflect how developers actually work.
Key capabilities applied in this engagement included:
- Rapid marketplace procurement: Defined.ai was selected because the customer needed real-world data fast, without waiting for a custom build.
- Repository-level depth: each asset included not only coding tasks but also commit timelines and related engineering artifacts that show how software changed over time.
- Production-grade source material: the repositories were drawn from real production software with real users, improving the relevance of the code data for model training.
- Language and platform breadth: the dataset spanned multiple ecosystems, including C#, Java, Python, and cross-platform environments such as mobile and desktop applications.
Together, these capabilities made the dataset useful for training developer-facing systems that need to reason over real repositories rather than isolated files.
The Solution: Delivering repository-scale code data through the Defined.ai AI Data Marketplace
We helped the customer move quickly by using our AI Data Marketplace to source and deliver production-grade code repositories that matched the customer’s training needs for agentic code systems.
1. Fast discovery of marketplace-ready data
Defined.ai’s marketplace helps team members quickly find the right datasets using advanced filters based on technical requirements, domain needs and delivery preferences. Rather than commissioning a new vibe coding dataset, the customer was able to define what it needed—repository-scale, production-grade code with engineering context—and Defined.ai matched that requirement to suitable marketplace assets.
2. More than raw source files
What made the delivered dataset valuable was not just the presence of code, but the richness of the repository assets themselves. The customer received 1,000+ closed-source, production-grade repositories that included commit histories and related engineering artifacts, giving its training team access to the context behind the code rather than just the final code state.
3. Flexible selection and tailoring
Our Marketplace is built to let customers browse and select data faster. On the platform, teams can review dataset options, request samples, obtain quotes and tailor datasets to project needs before procurement. For this engagement, that meant the customer could source repository-scale code data that matched its release timeline and technical use case without waiting for a bespoke data build.
4. Secure delivery for model training
Once the dataset was selected, we delivered it securely through API-based access for speed and scale. In practice, that meant the customer received the repository dataset in 5 days, well inside their two-week window it was targeting ahead of model release.
5. Why the marketplace mattered
The value of the Defined.ai AI Data Marketplace in this project was speed, relevance and reduced operational friction. For a customer preparing a new coding model release, that combination made the marketplace a practical way to move from requirement to training-ready data in days, not weeks.
The Results: Measurable Gains in Agentic Code Benchmark Performance
The customer received a fast-turn, production-grade repository dataset that supported measurable improvements in coding-agent performance between one model version and the next. Defined.ai’s contribution centered on the speed, realism and contextual richness of the training data.
Measurable outcomes included:
- Repositories delivered: 1,000+ production-grade, closed-source repositories.
- Delivery speed: dataset fulfilled in 5 days via secure API.
- Data quality: included commit history and related engineering artifacts, not just final code states.
- Language breadth: covered major real-world coding ecosystems including C#, Java and Python, plus mobile and desktop development contexts.
- Benchmark improvement: across four widely recognized coding-agent benchmarks—SWE-bench Verified, SWE-bench Pro, Terminal-Bench 2.0, and SkillsBench—the newer model version improved from an average score of 48.8 to 59.6, a gain of 10.8 points or 22.2%.
That uplift is especially meaningful because these benchmarks test practical developer-facing capabilities. Repository-level problem solving, multi-step software fixes, terminal execution and applied coding skill are exactly the kinds of workflows that matter to teams building agentic developer tools.
With our help, the customer sped up model improvement with production-grade repository data that reflected how software is actually built, changed and maintained in the real world. If your team needs data to build agentic models, browse 700+ datasets on our AI Data Marketplace or speak to a data expert to discuss custom projects.