Data-Driven Talk

AI & Developer Productivity:
The Real Numbers

93% of developers use AI coding tools — but productivity gains have plateaued at 10%. What does the data actually tell us?

Primary data: DX Research — "Measuring Developer Productivity & AI Impact" (Feb 2026)
121,000 developers · 450+ companies · Laura Tacho keynote at The Pragmatic Summit
Additional sources: Google DORA 2024/2025, METR RCT, Faros AI, GitClear, Brynjolfsson/Stanford, Thoughtworks Technology Radar
01 — Adoption

AI coding tools are now mainstream

This isn't early-adopter territory anymore. Based on data from 121,000 developers across 450+ companies, AI coding assistants have reached near-universal adoption.

92.6%
of developers use an AI coding assistant at least once a month
~75%
use one at least weekly
121K
developers surveyed across 450+ companies
02 — The Paradox

The 10% productivity plateau

Despite near-universal adoption, productivity gains have flatlined. The initial bump when AI tools arrived hasn't grown — it's held steady for over a year.

~10%
Productivity improvement — unchanged since AI adoption began
~4 hrs
Self-reported time saved per developer per week
3.6–3.7
Hours saved in Q4 2025 — flat vs Q2 2025
03 — Where Time Is Saved

Top time-saving use cases for AI assistants

Self-reported by developers who save at least 1 hour/week using AI coding assistants. The biggest wins come from understanding and maintaining existing code — not writing new code from scratch.

Stack trace analysis
~30%
Refactoring existing code
~27%
Mid-loop generation (inline completions)
~25%
Test case generation
~24%
Learning new techniques
~19%
Complex query writing
~18%
Code documentation
~17%
Brainstorming & planning
~16%
Initial code scaffolding
~15%
Code explanation
~13%
04 — The Real Bottleneck

We're optimising 20% of the picture

According to AWS survey data, the average developer spends only 20% of their time actually writing code. AI coding assistants are laser-focused on that sliver — leaving 80% of the workday untouched.

20%
coding
20% — Writing code

This is where AI coding assistants operate today

80% — Everything else
Meetings Discovery Design Compliance Operations Interrupts Code review Debugging Planning Context switching

How fast we can write code has never been the bottleneck. The real time sinks are everything around it — and that's where the next wave of AI impact needs to land.

05 — Flow State

Does AI keep developers in flow?

The theory: AI reduces the "context-switching tax" by bringing answers into the IDE instead of sending devs to docs, Stack Overflow, or Slack. The reality is more complicated.

23 min
Average time to regain deep focus after a single interruption
Source: Gloria Mark, UC Irvine
15–30 min
Productive coding time lost per context switch for developers
Source: Jellyfish Research
1,200
App/tool toggles per day for the average knowledge worker
Source: Harvard Business Review

The Promise

  • Answers arrive inline — no leaving the IDE for docs or search
  • Boilerplate and scaffolding handled instantly, keeping focus on logic
  • Unfamiliar APIs and libraries become approachable via autocomplete
  • Less time rebuilding mental models after interruptions

The Reality

  • Devs shift between "coding mode" and "prompting mode" dozens of times per hour
  • METR RCT: In a randomized controlled trial, experienced open-source developers were 19% slower with AI — yet believed they were 20% faster
  • Faros AI: high-AI teams juggled 47% more PRs/day — more parallel work, more switching
  • 9% of task time spent reviewing and correcting AI output — a new type of interruption

The takeaway: AI has the potential to reduce context-switching — but only if the tool fits seamlessly into the workflow. Poorly integrated AI creates a new type of interruption: the prompt-review-fix loop.

06 — AI-Authored Code

What IS changing: AI-written production code

While productivity is flat, the share of AI-authored code that ships to production keeps climbing. Data from 4.2 million developers (Nov 2025 – Feb 2026).

Previous quarter
22%
Current (Q1 2026)
26.9%
Daily AI users
~33%

"AI-authored code" = code merged into main/production with little to no human modification. Daily users now have nearly one-third of their shipped code written by AI.

07 — The Gap

Adoption ≠ Impact

The data reveals a clear disconnect: adoption has reached saturation levels, but measurable productivity improvements haven't kept pace.

93%
AI Adoption
10%
Productivity Gain
27%
AI-Authored Code

More code is AI-generated than ever, but the overall productivity needle hasn't moved.

08 — Where AI Delivers

The biggest win: onboarding speed

AI's clearest measurable impact isn't raw coding speed — it's how fast new developers become productive. Time to 10th PR (a standard onboarding metric) has been halved.

50%
Reduction in onboarding time (Q1 2024 → Q4 2025)
Measured by "time to 10th Pull Request"
2+ yrs
How long the productivity boost from faster onboarding lasts
The faster someone ramps up, the longer the compounding effect

AI is particularly powerful for getting people up to speed — whether it's new hires, engineers switching projects, or non-engineers stepping into technical workflows.

— Laura Tacho, CTO at DX
09 — Organizational Impact

Same tools, wildly different outcomes

Data from 67,000 developers shows AI acts as an amplifier — it makes good orgs better and struggling orgs worse. There is no "typical" experience.

Well-structured orgs

  • 50% fewer customer-facing incidents
  • AI as "force multiplier" for speed & quality
  • Higher reliability at scale
  • Clear goals + measurement = compounding gains

Struggling orgs

  • 2× more customer-facing incidents
  • AI exposes existing flaws rather than fixing them
  • Quality and reliability suffer
  • Adoption without strategy = amplified dysfunction
10 — The Management Layer

This is a management problem,
not a tooling problem

The hype made it sound like just trying AI would automatically pay off. But so far, most tools have been used for individual coding tasks. To see real impact, we need to use AI at the organizational level, not just for single tasks.

— Laura Tacho

Transformation is uncomfortable. Organizations that were ready to quit their cloud or agile transformations are now giving up on AI transformation, too.

— Laura Tacho

The Faros AI evidence: Across 10,000+ developers and 1,255 teams, high-AI teams merged 98% more PRs — but PR review time increased 91%. The bottleneck isn't code generation. It's everything downstream: review queues, testing, and release pipelines that can't absorb the new velocity.

Adoption alone doesn't guarantee results. Just using the tools doesn't automatically improve an organization.

11 — The DORA Evidence

DORA confirms: AI ships faster, breaks more

Google's DORA research program — the industry standard for measuring software delivery — studied ~5,000 professionals across two years and found a consistent pattern: AI improves speed but increases instability.

DORA 2024 — Per 25% increase in AI adoption
Delivery stability −7.2%
Delivery throughput −1.5%
Valuable work time −2.6%
Code quality +3.4%
Documentation quality +7.5%
Code review speed +3.1%
DORA 2025 — What changed a year later
Delivery stability Still negative ⚠️
Delivery throughput Now positive ↑
AI adoption 76% → 90%
Trust in AI output 30% low/none
Developer sentiment 72% → 60% ↓
Key finding AI = amplifier

Why does stability drop?

AI enables developers to produce larger changesets faster. Bigger batches = higher risk of failure — something DORA has warned about for a decade. Teams also report over-relying on AI during code review, which speeds reviews but misses defects. The "fail fast, fix fast" hypothesis was tested — the data doesn't support it. Instability still harms product performance and increases burnout.

12 — Code Quality

More code, but is it good code?

GitClear analyzed 211 million changed lines of code across five years (2020–2024). As AI adoption rose, several quality indicators moved in the wrong direction.

5.7%
Code churn rate in 2024 — code revised within 2 weeks of commit
Up from 3.1% in 2020
Growth in duplicated code blocks in 2024 vs prior years
Copy/paste lines rose from 8.3% to 12.3%
9.5%
Refactored (moved) lines in 2024 — down from 24.1% in 2020
Less restructuring = growing tech debt

The pattern: AI makes it easy to add code, but the harder work — refactoring, restructuring, paying down tech debt — is declining. More code is being written, revised quickly, and duplicated. This aligns with the DORA finding that AI increases throughput but decreases stability.

13 — Tool Spotlight

Codex: early signals from scale

OpenAI's Codex desktop app has emerged as one of the fastest-growing AI developer tools, with notable enterprise adoption.

1M+
Downloads since Feb 2 launch
60% growth rate in one week
60%
More PRs per week from Codex users inside OpenAI
95% of OpenAI devs use Codex internally
18K
Engineers at Cisco using Codex daily
For complex migrations & code reviews
50%
Reduction in code review time at Cisco
A tangible enterprise outcome

A note on these numbers

These are vendor-reported metrics from OpenAI, not independently verified. The "60% more PRs" stat comes from internal usage — the same environment where 95% adoption may reflect organizational expectation rather than organic choice. Compare with the METR RCT (independent, randomized) which found experienced devs were 19% slower with AI tools.

14 — Tool Landscape

The AI coding market in 2026

Three players now hold 70%+ of the ~$4B market. All have crossed $1B ARR. But the lines are blurring — GitHub now lets you run Claude and Codex inside Copilot.

GitHub Copilot
Microsoft / GitHub
~42%
Paid market share
20M+ users · 90% of Fortune 100 · IDE-first autocomplete · Now multi-model (Claude + Codex available)
Claude Code
Anthropic
$2.5B
ARR (Feb 2026, 2× since Jan 1)
29M daily installs · 4% of GitHub commits · Terminal-first agentic tool · Fastest growing · Revenue and install figures are vendor-reported
Cursor
Anysphere
$500M+
ARR (2025)
AI-native IDE · Multi-model · Default for many individual devs · Raised at 30× revenue
Codex (OpenAI)
1M+ downloads since Feb 2 · Autonomous agent (not autocomplete) · Used by 18K Cisco engineers daily · Revenue undisclosed but "potential to shake up the leaderboard"
What developers actually do
Survey of 99 devs: Claude Code (58), Copilot (53), Cursor (51) most adopted — 29 use multiple tools simultaneously. The era of picking one tool is over.
15 — Architectural Difference

Both are agentic now — so what's actually different?

Copilot, Cursor, and Claude Code all have agent modes. The model is often the same (you can run Claude inside all three). The real differences come down to where the agent lives, how deep it sees, and what it can touch.

GitHub Copilot (with Agent Mode)
Lives in: IDE (VS Code, JetBrains)
Agent mode: Yes — multi-step, runs terminal commands
Context: Workspace files + LSP intelligence
Models: GPT-4o, Claude, Codex — your choice
Strengths: GitHub-native (PRs, issues, actions), multi-model flexibility, lowest price entry
Trade-off: Agent features are newer; core DNA is still completion-first
vs
Claude Code (Terminal-first Agent)
Lives in: Terminal / CLI (editor-agnostic)
Agent mode: Agent-only — built agentic from day one
Context: Maps full repo via agentic search + sub-agents
Models: Claude only (Opus, Sonnet, Haiku)
Strengths: Deep repo reasoning, parallel sub-agents, self-correcting test loops
Trade-off: Steeper learning curve, higher price, locked to Claude models

So why does Claude Code still feel different?

Copilot added agentic features onto an autocomplete foundation. Claude Code was built as an agent — terminal access, filesystem control, and self-correcting test loops are the core, not a bolt-on. The difference is felt most on complex, multi-file tasks: refactors across 50 files, migrations, and architectural changes where full repo awareness matters. For inline completions and quick edits, Copilot is often faster and more seamless.

The practical answer: Most teams are converging on using both — Copilot for inline speed and GitHub integration, Claude Code for heavy-lift agentic tasks. The era of picking one tool is ending.

16 — Spending Allocation

Where should you allocate AI tokens?

The data challenges the assumption that senior engineers should get priority. Junior and newer developers consistently show the largest measurable productivity gains from AI tools.

Junior / New-to-codebase devs

  • 21–40% productivity boost (Cui et al. — randomized controlled trial, 4,867 developers across Microsoft, Accenture, and a Fortune 100 company)
  • Highest adoption rates and most consistent daily usage
  • Biggest gains from scaffolding, boilerplate, and doc lookups — filling knowledge gaps
  • Onboarding time to 10th PR cut in half — a compounding ROI over 2+ years

Senior / Staff+ engineers

  • 7–16% productivity gains — modest and inconsistent
  • Lowest adoption levels across all seniority bands
  • METR study: experienced devs were 19% slower with AI on familiar codebases
  • But: Staff+ who do adopt save the most time — 4.4 hrs/week for daily users

The nuance: it depends on the task, not just the title

High ROI

Onboarding new devs, unfamiliar codebases, boilerplate, scaffolding, test generation, docs

Medium ROI

Refactoring, stack trace analysis, code review assistance, migrations

Low / Negative ROI

Complex architecture in familiar codebases, deep system design, senior devs who already know the answer

Bottom line: Invest heavily in AI for onboarding and junior devs — the ROI is proven and compounds. For senior engineers, don't force adoption — instead, remove barriers and let them self-select into high-leverage use cases.

17 — The Foundation

DevEx is a prerequisite, not just an outcome

Organizations winning with AI already had strong developer experience fundamentals in place. AI amplifies what's already there — for good or ill.

Fast CI

Quick feedback loops let AI-generated code get validated faster

📖

Clear Documentation

AI tools perform significantly better with well-documented codebases

🧩

Well-Defined Services

Clear boundaries make AI-assisted changes safer and more predictable

Good documentation, robust observability, fast test execution, frictionless local dev experience — these aren't just nice-to-haves in the world of AI-assisted engineering. They're essential.

— Laura Tacho
18 — Measurement

How to actually measure AI impact

According to the DX AI Measurement Framework, you need to track three dimensions — not just adoption.

Utilization

How widely are AI tools adopted? Daily/weekly usage, tool penetration across teams.

Impact

How does AI change performance? Time savings, PR throughput, developer experience, quality.

Cost

What's the ROI? Tool spend vs measurable gains. Identify high-leverage use cases worth scaling.

Over-indexing on one dimension (e.g., adoption rates) gives a false picture. Combine direct metrics (time saved) with indirect metrics (throughput, quality, developer satisfaction) for the full story.

19 — The Emerging Practice

Spec-driven development:
structure for the age of AI agents

If AI coding assistants plateau at ~10% gains with unstructured prompting, the next question is obvious: what happens when you give them better instructions? Spec-driven development (SDD) treats the specification — not the code — as the primary artifact. The AI generates implementation from structured, version-controlled specs.

Spec-First

Specifications guide code generation. Developer writes a detailed spec, AI implements it. The most common level today — used by GitHub Spec Kit and Kiro.

Spec-Anchored

Specs both constrain and validate AI output. Automated checks verify generated code against the spec. Closer to contract-driven or BDD-style workflows.

Spec-as-Source

The specification replaces code as the maintained artifact. Code becomes disposable output. The radical end of the spectrum — explored by Tessl (private beta).

72K+
GitHub stars on Spec Kit in ~6 months
Supports 22+ AI agent platforms
78%
Enterprise teams with integrated AI tools
Shift from "vibe coding" to structured agentic coding accelerating

Spec-driven development may not have the visibility of a term like vibe coding, but it's nevertheless one of the most important practices to emerge in 2025.

— Thoughtworks Technology Radar, 2025
20 — The Framework Landscape

Four frameworks, four philosophies

The SDD tooling ecosystem has consolidated around four open-source approaches, each optimized for a different context. AWS Kiro adds a commercial fifth option.

GitHub Spec Kit
The de facto standard · 72K+ stars
Workflow: Specify → Plan → Tasks → Implement
Best for: Agent-agnostic teams (22+ platforms)
Philosophy: Lightweight, markdown-based, iterative
BMAD Method
Multi-agent framework · Enterprise-scale
Workflow: 21 specialized AI agents, 50+ guided workflows
Best for: Large greenfield projects (50+ developers)
Philosophy: Simulates a full agile team with AI personas
OpenSpec
Brownfield-first · Minimal overhead
Workflow: Change-centric plain markdown specs
Best for: Legacy codebases and existing projects
Philosophy: Don't impose greenfield process on brownfield reality
AWS Kiro
Commercial IDE · Preview at $20/mo
Workflow: Natural language → stories, criteria, design docs, tasks
Best for: AWS-integrated enterprise teams
Philosophy: SDD baked into the IDE, not bolted on

Choosing a framework

Small teams (2–10): Spec Kit or OpenSpec for simplicity. Medium teams (10–50): Kiro or Spec Kit for collaboration. Large orgs (50+): BMAD or Kiro for governance. The era of picking one tool is ending — 29 of 99 surveyed developers already use multiple AI tools simultaneously.

21 — The Honest Assessment

The promise is real.
The evidence isn't — yet.

Spec-driven development addresses the right problem: unstructured prompting produces inconsistent results. But the space has a significant gap between practitioner enthusiasm and rigorous measurement.

What the logic says

  • Specifications reduce the ambiguity that causes AI to guess — directly addressing the code churn GitClear documented
  • Version-controlled specs create audit trails that DORA research identifies as a success factor
  • The Specify → Plan → Task → Implement workflow enforces the incremental delivery that prevents big-bang failures
  • 72K GitHub stars and enterprise adoption (AWS Kiro) suggest strong market signal

What the data doesn't say

  • No peer-reviewed study has quantified SDD's impact on productivity, quality, or delivery speed
  • Claims of "10× performance" are self-reported and vendor-adjacent — not independently verified
  • Thoughtworks warns of reverting to waterfall antipatterns: heavy up-front specification and big-bang releases
  • Most teams are still experimenting — production-scale SDD workflows are rare outside early adopters

The waterfall question

The most active debate in the SDD community is whether this is genuinely new or just waterfall with AI characteristics. The Spec Kit model (lightweight, iterative markdown) is designed to avoid this trap. BMAD's 21-agent approach risks it. The answer likely depends on team discipline, not the framework — the same lesson the agile movement learned two decades ago.

Watch this space: SDD is early-2026's most promising methodology shift. But apply the same skepticism we applied to AI productivity claims earlier in this deck — demand measured outcomes, not just adoption numbers.

22 — The Macro Picture

The J-curve: are we entering the harvest phase?

Economist Erik Brynjolfsson argues that AI is following the pattern of every general-purpose technology: an investment phase where productivity appears flat, followed by a harvest phase where gains accelerate.

2.7%
US productivity growth in 2025
Nearly double the 1.4% annual average of the past decade
3.7%
Q4 2025 GDP growth
Strong output despite slower job gains — classic productivity signal

We are transitioning from an era of AI experimentation to one of structural utility.

— Erik Brynjolfsson, Stanford Digital Economy Lab, Feb 2026

The counterpoint

The developer-level data (DX, DORA, METR) shows modest, plateauing gains — while macro data shows acceleration. This gap may reflect what Brynjolfsson calls "a small cohort of power users" driving outsized impact, or it may be that the macro gains are coming from non-coding AI use cases (customer service, content, operations). The 2026 data will be decisive.

23 — The Strategic Reframe

Stop optimising the old machine.
Start testing the next one.

Every data point in this deck tells the same story: AI's gains in existing codebases are real but modest (~10%), plateau fast, and come with stability risk. That's not where the business advantage lives.

Where most companies point AI
Making existing products slightly faster to build
Squeezing 10% out of mature codebases
Autocompleting the thing you already know how to build
Improving code quality on features customers already have
→ Optimising for efficiency
Where the actual advantage is
Making it radically cheap to test new ideas
Running 10 experiments in the time one used to take
Finding product-market fit before competitors even scope
Killing bad ideas in days instead of quarters
→ Optimising for discovery

AI doesn't make your existing product 10× better. It makes building the wrong thing 10× cheaper. That changes the economics of experimentation — and that's where established companies have been losing to startups for decades.

24 — New Ventures Playbook

How established companies should deploy AI

The DORA data is clear: AI destabilises mature systems. But instability is a feature, not a bug, when you're exploring new territory. Point AI where failure is cheap and learning is expensive.

🧪
Rapid Prototyping
Build MVPs in days, not quarters. Ship throwaway prototypes to real users. The prototype is the research. DORA's instability penalty doesn't matter — this code is disposable by design.
🎯
Customer Signal Hunting
Instead of researching what customers want, build it and watch what they do. AI collapses the cost of testing a hypothesis from "6-week sprint" to "Tuesday afternoon." Run 10 bets, kill 9, double down on 1.
Adjacent Ventures
You have domain knowledge, customer relationships, and data that startups don't. AI lets you explore new business models at startup speed with enterprise insight. That's the unfair advantage — if you use it.

Where NOT to point it

Don't pour AI into making your existing cash cow 10% faster. The DORA data shows that's where instability hurts most and gains plateau fastest. Protect your core with engineering discipline. Unleash AI on the edges — where you're exploring, not maintaining.

25 — The Disruption

Aggregators are the first casualty

If AI collapses the cost of building and testing software (Slides 20–21), it doesn't just affect how developers work — it reshapes which business models survive. The first casualties are businesses whose value proposition is connecting rather than creating.

The old model
Aggregator value: discovery, trust, convenience
Moat: expensive to build alternatives
Economics: extract margin by sitting in the middle
Customer loyalty: to the platform, not the seller
Innovation speed: slow — high cost of failure
The AI-native model
Direct value: solve the actual problem, not just list options
Moat: customer insight + speed of iteration
Economics: capture margin by delivering outcomes
Customer loyalty: to the experience, not the catalogue
Innovation speed: fast — failure is Tuesday afternoon

The companies that win won't be the ones who used AI to build a better marketplace. They'll be the ones who used AI to figure out the marketplace was unnecessary — and built the thing that replaces it.

26 — Summary

The numbers at a glance

92.6%
Developers using AI monthly
~10%
Productivity gain — plateaued
26.9%
Production code now AI-authored
50%
Faster onboarding (time to 10th PR)
~4 hrs
Time saved per dev per week
4.2M
Developers in the dataset
−7.2%
Delivery stability drop (DORA 2024)
2.7%
US productivity growth in 2025 — nearly 2× the decade average
27 — Closing Thought

AI won't save your existing product.
It'll help you find the next one.

The data is clear: applying AI to mature codebases yields modest, plateauing gains with real stability costs. The transformative play is using AI to collapse the cost of experimentation — to find what delights your customers before your competitors even finish scoping.

Don't use AI to build faster. Use it to learn faster.

Primary data: DX Research — "Measuring Developer Productivity & AI Impact" (Feb 2026)
Talk: Laura Tacho keynote at The Pragmatic Summit, Feb 11, 2026
Additional: Google DORA 2024/2025, METR RCT (2025), Faros AI, GitClear (2025), Brynjolfsson/Stanford (2026)
Spec-driven development: Thoughtworks Technology Radar, Piskala (arXiv 2026), GitHub Spec Kit, BMAD, OpenSpec, AWS Kiro
Multi-company RCT: Cui et al. — "Effects of Generative AI on High-Skilled Work" (4,867 devs)
Full video: youtube.com/watch?v=LOHgRw43fFk