sdlcnext.com
← All posts
AI research context software-engineering methodology

The Research Gap: Why AI Coding Fails Before a Single Line Is Written

The biggest source of AI-generated code failures isn't the model, the prompt, or the tool. It's what happens — or doesn't happen — before the implementation stage begins. Context preparation is the missing discipline.


Viewpoint

Most conversations about AI coding failures focus on the output. The generated code had a bug. The model hallucinated an API that doesn’t exist. The implementation missed an edge case. These are real problems. They are also symptoms.

The deeper failure, the one that determines whether AI-assisted development produces reliable results or expensive rework, almost always happens earlier. It happens in the research and context preparation stage, or more precisely, in the absence of one.

Error amplification is the mechanism

Think about the AI-assisted development workflow as a pipeline: Research, then Plan, then Implement. Each stage consumes the output of the previous one. And each stage amplifies the errors it inherits.

Error amplification across Research → Plan → Implement

If the research stage is thin, if the developer jumps straight to prompting without understanding the existing codebase’s conventions, the API constraints, the implicit business rules, the plan inherits those gaps. The implementation then builds confidently on top of a flawed plan, producing code that is internally consistent but externally wrong.

This is not a new idea in software engineering. Brooks wrote about it in 1975. The cost of fixing a requirements error found in production is orders of magnitude higher than fixing it during design. What AI has done is compress the timeline so dramatically that teams skip the research phase entirely, because the implementation phase feels so cheap.

That cheapness is an illusion. The METR randomised controlled trial found experienced developers were 19% slower with AI on familiar codebases, and a significant contributor was the prompt-review-correct loop that AI introduces. When context is missing, that loop doesn’t run once. It runs repeatedly, each iteration attempting to patch the consequences of the original gap.

The prompt-review-correct loop without context

What “research” actually means in an AI workflow

In a traditional development workflow, research is implicit. A senior developer working in a codebase they know carries the context in their head: the naming conventions, the architectural patterns, the business rules that never made it into documentation. They don’t think of it as research. It’s just knowing the territory.

AI doesn’t know the territory. And this is where the failure mode lives. The developer’s implicit knowledge doesn’t transfer to the model through a prompt. What transfers is whatever the developer explicitly provides, plus whatever the model can infer from the code it can see.

The gap between what the developer knows and what the model receives is the research gap. The wider it is, the worse the output.

The research gap: what the developer knows vs what the model receives

Concretely, research in an AI-assisted workflow means:

Codebase context: What patterns does this project use? What are the conventions for error handling, logging, testing? What does the dependency graph look like around the area you’re changing? AI tools reason over whatever context they’re given. Providing a single file when the change touches three services produces predictably poor results.

Constraint discovery: What are the non-obvious constraints? Rate limits, backward compatibility requirements, performance budgets, compliance rules? These rarely appear in the code itself. They live in wikis, Slack threads, incident postmortems, and the heads of senior engineers. If they don’t make it into the AI’s context window, they don’t exist for the purposes of generation.

Prior art review: Has this problem been solved before in the codebase? Is there an existing utility, pattern, or service that should be reused rather than duplicated? GitClear’s finding that duplicate code blocks grew at 4x the rate of prior years is, in part, a research failure: AI generating new code because it wasn’t given visibility into what already existed.

Why teams skip it

The economics of AI-assisted development create a perverse incentive to skip research. When implementation feels nearly free, when you can generate a working prototype in minutes, the upfront cost of research feels disproportionate. Why spend 30 minutes understanding the codebase when the AI can produce something in 30 seconds?

The answer is in the data. The DX Research finding that AI productivity has plateaued at 10% despite near-universal adoption is substantially a research problem. Developers are generating code faster and then spending the saved time on review, debugging, and rework, the downstream consequences of insufficient upfront context.

DORA’s finding that delivery stability drops 7.2% for every 25% increase in AI adoption tells the same story from a different angle. The instability is not coming from AI being bad at writing code. It’s coming from AI writing code without enough context to write the right code.

The Faros AI data on review times is the clearest signal: high-AI teams merged 98% more PRs but review time increased 91%. Reviewers are catching the problems that insufficient research introduced. The review stage is doing the work that the research stage should have done.

Where the missing research surfaces: generation vs review bottleneck

The spec-driven connection

This is why spec-driven development has gained traction so quickly. At its core, SDD is a formalised research and context preparation discipline. The specification isn’t just instructions for the AI. It’s evidence that the developer has done the upfront work of understanding what needs to be built and why.

GitHub Spec Kit’s Specify, Plan, Tasks, Implement workflow puts research first by design. The specification stage forces the developer to articulate constraints, conventions, and context before any code is generated. The 72,000 GitHub stars suggest this resonates.

But you don’t need a framework to do this. The principle is simpler than the tooling: the quality of AI-generated code is bounded by the quality of the context it receives. Invest in the input. The output follows.

The practical takeaway

Before your next AI-assisted implementation, ask three questions:

Does the AI have enough codebase context to match existing patterns, or is it guessing? If you’re providing a single file and expecting system-level consistency, the output will disappoint.

Have you surfaced the constraints that don’t live in the code? Business rules, performance requirements, compatibility guarantees. If they’re not in the prompt or the context window, they’re not in the output.

Does the AI know what already exists? Duplication is the default when the model can’t see prior art. The 30 seconds you spend pointing it at existing utilities saves the 30 minutes you’d spend in review finding the duplication.

The research stage is where AI-assisted development is won or lost. Not in the model. Not in the prompt. In the preparation that happens before either one is invoked.


Sources: METR Randomised Controlled Trial (2025). DX Research (Feb 2026). Google DORA 2024/2025. GitClear AI Copilot Code Quality Research. Faros AI (2026). Fred Brooks, The Mythical Man-Month (1975). GitHub Spec Kit.