sdlcnext.com
← All posts
AI engineering-leadership developer-productivity devex

Same Tools, Wildly Different Outcomes

Data from 67,000 developers shows that AI acts as an amplifier — it makes good engineering organisations better and struggling ones worse. This is a management problem, not a tooling problem.


Viewpoint

Two companies buy the same AI licences, deploy them to the same developer population, and measure the results six months later. One sees a 50% reduction in customer-facing incidents. The other sees twice as many.

This is not a hypothetical. It is what DX Research found across 67,000 developers in their November 2025 to February 2026 dataset. And it is the most consequential finding in AI productivity research that almost nobody is talking about.

AI as amplifier

The instinct when productivity gains disappoint is to reach for the tool explanation. The AI isn’t good enough yet. We picked the wrong platform. We need a different model. These are natural places to look, and sometimes they are right.

But the divergence in outcomes across the 67,000-developer dataset is too large to be explained by tooling differences. The organisations in that dataset were largely using the same tools. The variable was not the product. It was the organisation.

Well-structured organisations saw AI act as a force multiplier: faster delivery, higher quality, fewer incidents. Struggling organisations saw the opposite. AI exposed existing weaknesses and accelerated them: more instability, more incidents, worse quality.

AI does not fix a bad development process. It accelerates whatever process you have.

The Faros AI evidence

Faros AI published data covering 10,000+ developers across 1,255 teams. High-AI teams merged 98% more PRs per day than low-AI teams. That sounds like a productivity breakthrough until you read the next number: PR review time increased 91%.

The bottleneck did not move. Code production sped up substantially. The capacity of the downstream pipeline, review queues, testing infrastructure, release processes, stayed the same. Code piled up. Review times stretched. The system as a whole did not get faster; it got more congested.

This is a management problem. Not in the pejorative sense of blaming managers, but in the precise sense: it is a problem of system design, workflow, and organisational capacity. A better AI model will not solve it.

The adoption trap

Laura Tacho, CTO at DX, framed it directly: “The hype made it sound like just trying AI would automatically pay off. But so far, most tools have been used for individual coding tasks. To see real impact, we need to use AI at the organisational level, not just for single tasks.”

This is the adoption trap. Distributing tool licences is easy to measure. Adoption metrics look good. But individual-level productivity gains that hit a bottleneck at the team or pipeline level do not produce organisational outcomes. They produce congestion.

The organisations seeing positive results in the DX data share common characteristics: clear goals for AI use, measurement practices that track impact rather than just adoption, and engineering fundamentals that let them absorb increased throughput. Fast CI pipelines. Good documentation. Well-defined service boundaries. These are the enablers that determine whether AI-generated code flows through to production safely, or backs up in review queues and incident queues.

What this means for leadership

If you are in an engineering leadership role and your AI adoption programme has not moved the organisational metrics, delivery frequency, incident rate, developer satisfaction, adoption numbers are not the problem to investigate.

Three questions worth asking:

First: is review capacity keeping up with generation capacity? If developers are producing code 30% faster but reviewers are not reviewing 30% faster, you have created a new bottleneck. Either invest in review tooling and process, or consciously decide that throughput is not the metric you are optimising for.

Second: what does AI do to your weakest links? The organisations seeing 2x more incidents in the DX data did not have good processes that AI disrupted. They had fragile processes that AI accelerated into failure. Where are your fragile processes? AI will find them before you do.

Third: are you measuring adoption or impact? Adoption tells you how many people are using the tool. Impact tells you whether outcomes are changing. The DX AI Measurement Framework recommends tracking utilisation, impact, and cost as three separate dimensions. Collapsing them into a single adoption percentage hides the information you need.

The tool is not the strategy. Deciding what problem you are actually trying to solve, measuring whether you are solving it, and building the organisational capacity to absorb the changes AI creates: that is the strategy. The tool is just the accelerant.


Sources: DX Research, 67,000 developers, Nov 2025 to Feb 2026. Faros AI, 10,000+ developers, 1,255 teams (2026). Laura Tacho keynote, The Pragmatic Summit, Feb 2026.