sdlcnext.com
← All posts
AI developer-productivity ROI engineering-leadership devex

Where AI Actually Delivers ROI: A Practical Guide

Not all AI investment is equal. The data on who benefits most, which use cases have the best returns, and what organisational foundations have to be in place before AI delivers at all.


Viewpoint

The aggregate productivity numbers from AI research are modest and plateauing. But aggregate numbers hide enormous variance. Some developers are saving 4+ hours a week. Some are going slower. Some teams have halved their onboarding time. Others are drowning in review queues.

The question is not “does AI improve productivity?” The honest answer to that is “sometimes, for some people, on some tasks.” The useful question is: where does AI reliably deliver, and how do you set up the conditions for it?

The clearest win: onboarding

Of all the productivity metrics in the DX Research dataset, the onboarding result stands out for its size and its clarity.

Between Q1 2024 and Q4 2025, the time for a new developer to reach their 10th pull request, a standard industry proxy for “productive contributor,” was cut in half. Fifty percent reduction. That is not the kind of gain you see in self-reported time-savings surveys. That is a measurable, objective outcome tracked against a consistent benchmark.

The gains compound. A developer who ramps up faster starts contributing to production earlier. The DX research found the productivity boost from faster onboarding persists for two or more years. When you run the maths, the ROI from AI-assisted onboarding is among the strongest in the entire AI productivity literature, and it scales directly with how often you hire.

If you are deploying AI tools to exactly one population, make it new developers and developers new to a codebase.

Junior developers vs senior engineers

The seniority question is where most organisations get their assumptions backwards.

Three independent studies now point in the same direction. A multi-company randomised controlled trial by Cui et al. (covering 4,867 developers across Microsoft, Accenture, and a Fortune 100 company) found 21-40% productivity gains for junior and mid-level developers. The METR randomised controlled trial found experienced developers were 19% slower with AI tools, on tasks in codebases they knew well.

The intuition behind most enterprise AI rollouts is that senior engineers should be the priority, since they are more expensive per hour and time savings are worth more. The data says the opposite: junior developers show the largest and most consistent gains, and senior engineers who already know the answer derive less benefit from a tool that needs to guess it.

This does not mean senior engineers should not use AI. Staff+ engineers who do adopt daily save around 4.4 hours per week, the highest absolute time saving of any group. The point is that adoption should not be forced at the senior level. Remove barriers, measure outcomes, and let senior developers self-select into the use cases where they find it genuinely useful. For many, that turns out to be architectural exploration, code review assistance, and complex query writing, not inline completion.

The task-level ROI breakdown

The returns from AI are not uniform across task types. Based on the combined evidence:

The clearest returns come from tasks where the developer is navigating unfamiliar territory: onboarding to a new codebase, boilerplate and scaffolding, test generation, documentation. AI’s pattern-matching is strongest when the developer does not have strong existing intuitions to override.

Refactoring, stack trace analysis, code review assistance, and migrations produce useful but less consistent gains. The overhead of reviewing AI output becomes more significant as tasks get more nuanced.

Complex architecture decisions in familiar codebases, deep system design, tasks where the senior engineer already knows the answer: this is where the METR finding lives. The prompt-review-correct loop adds friction rather than removing it.

The common thread: AI’s value is inversely correlated with how well you already know the territory. When you are navigating something new, AI is a remarkable accelerant. When you are operating in deeply familiar ground, the prompt-review-correct loop adds more friction than it removes.

DevEx is the prerequisite

The DX data is unambiguous on one point: the organisations that see AI working well already had strong developer experience fundamentals in place before AI arrived. This is not a coincidence.

Fast CI/CD pipelines matter more than most teams expect. If your test suite takes 45 minutes to run, the prompt-code-verify loop is broken regardless of how good the AI is.

Clear, maintained documentation improves AI performance measurably. Coding assistants reason over your codebase and perform significantly better when it has clear naming conventions and enough context for the model to make accurate inferences. The cost of poor documentation was always there; AI makes it visible faster.

Well-defined service boundaries make AI-assisted changes safer. When services have clear interfaces and responsibilities, an AI-generated change to one service is less likely to break another. In tightly coupled systems, AI-generated code creates exactly the instability DORA documented.

None of these are new ideas. What AI has done is sharpen the cost of not having them.

How to measure whether it is working

The DX AI Measurement Framework separates AI impact into three dimensions, and tracking all three matters.

Utilisation is the easiest to track: how widely are tools adopted, what is the daily versus weekly split, which teams are engaged. Necessary context, but not sufficient evidence of value.

Impact is what most organisations undertrack: actual throughput, incident rates, onboarding time, developer satisfaction scores. Not self-reported time savings. These are the numbers that determine whether the investment is paying off.

Cost is the dimension that often gets skipped: total programme cost (licences, infrastructure, training time, oversight overhead) relative to measurable gains. Which use cases have the best returns? Where is money being spent on tools that are not delivering?

Most organisations tracking AI adoption are measuring utilisation and calling it impact. The companies seeing genuine returns are measuring all three.


Sources: Multi-company RCT, Cui et al., “Effects of Generative AI on High-Skilled Work” (4,867 developers, IT Revolution 2024). METR RCT (2025). DX Research Q4 2025, Feb 2026. Faros AI (2025). DX AI Measurement Framework.