Why AI Agents Need Specifications

Ambiguity and amnesia are where AI-assisted development falls apart

February 15, 2026

AI coding agents are remarkably capable. Give one a well-defined task and it will produce working code faster than most developers can type. The problem is what happens before and after. Give it a vague prompt and it will confidently build the wrong thing. Come back tomorrow and it has no memory of what it built or why.

These two failure modes — ambiguity and amnesia — are where most AI-assisted development falls apart. The industry's answer has been "Spec-Driven Development": structured frameworks that force you through requirements, design, and task planning before the agent writes a line of code. The idea is sound. The implementations are mostly overkill. But the core insight — that AI agents need persistent, structured intent to stay on track — is worth understanding.

The Ambiguity Problem

An LLM will always produce output. Ask it to "build a dashboard" and you'll get a dashboard. Whether it's the right dashboard is another matter.

The issue isn't capability — it's intent. A human developer asking "what exactly do you mean by dashboard?" is performing a critical function that AI agents skip entirely. They optimise for plausible output, not correct output. Without explicit constraints — who uses this, what data it shows, how it integrates with existing systems — the agent fills in the blanks with training data defaults. The result looks professional and works perfectly. It's just not what you needed.

This compounds with complexity. A single well-prompted request for a login page will usually produce something usable. A request to "add authentication to our existing application" requires understanding the current codebase, the team's conventions, the deployment environment, and a dozen business rules that live in someone's head. The gap between what the agent knows and what it needs to know grows with every layer of real-world context.

The fix: Writing down what you want — clearly, with constraints and acceptance criteria — before the agent starts working is the single highest-leverage thing you can do. We've known since the 1970s that specifying intent before implementation reduces defects. AI agents just make the cost of skipping this step much higher, because they'll happily build the wrong thing at speed.

The Amnesia Problem

Context windows reset. Sessions end. Developers switch tasks, go home, come back on Monday. The agent that spent an hour reasoning through your architecture yesterday starts fresh today with zero knowledge of those decisions.

This is the problem that actually matters for production teams. Not "how do I get the AI to write code faster?" but "how do I stop the AI from contradicting decisions it made three sessions ago?"

Without persistent artifacts — written proposals, design documents, task lists — every new session is a cold start. The agent re-reads your codebase, re-infers your conventions, and re-makes decisions that may or may not align with what was already agreed. In practice, this means developers spend significant time re-establishing context, correcting drift, and reconciling inconsistencies between sessions.

The fix: External memory. Markdown files in your repository that capture what was decided, why, and what's left to do. When the agent starts a new session, it reads the proposal and task files and has immediate access to the project's intent without hallucinating history. The files are the source of truth, not the model's memory.

The Overhead Trap

The whole point of AI-assisted development is to move faster. But if the process you wrap around the AI takes longer than the time the AI saves, you've net lost productivity. That's the trap most SDD frameworks fall into.

The SDD landscape has grown quickly. AWS shipped Kiro, a full IDE that pushes you through requirements and design phases. GitHub released Spec Kit with a multi-step pipeline. BMAD-METHOD simulates an entire agile team with 19 specialised AI agents. Each framework adds ceremony around the same core idea: write down what you want before building it. The question is how much ceremony before the overhead cancels out the gains.

The problem is proportionality. Birgitta Böckeler documented Kiro generating 4 user stories with 16 acceptance criteria for a small bug fix. Marmelab found Spec Kit producing 8+ files and 1,300 lines of markdown for a feature that displayed a date. The overhead of the specification process exceeded the complexity of the actual work.

Acceptance criteria

Kiro generated for a small bug fix

1,300

Lines of markdown

Spec Kit produced for a date display

Multi-agent orchestration — routing a request through Analyst, then PM, then Architect, then Developer — looks sophisticated but misses a fundamental point. An LLM isn't a team. It already possesses knowledge across all those domains simultaneously. Decomposing a task into four sequential agent handoffs means four forward passes through the model with context loss at every boundary. You're paying for coordination overhead to solve a problem that doesn't apply to a single intelligence.

The evidence: The only reliable productivity data we have — GitHub's Copilot study — showed developers completing tasks 55% faster with one assistant and clear intent. No multi-agent orchestration. No phase gates. No 19-agent simulation of a standup meeting.

What Works in Practice

After evaluating these frameworks on production systems, we adopted OpenSpec. It was the only one designed around how real work actually happens. Most professional development is brownfield — existing codebases, legacy systems, incremental features. Not greenfield apps built from scratch. The other frameworks either ignored this or treated it as an afterthought.

Two things set it apart. First, it's brownfield-native. Two directories in your repository: specs/ holds the current truth about what your system does, changes/ holds proposals for what should change. Each change uses explicit ADDED/MODIFIED/REMOVED markers so the agent only processes the delta, not the entire system.

Second, it's designed for token efficiency. Context windows are finite and LLMs degrade on long inputs — OpenSpec feeds the agent the change scope, not your whole architecture. As your specs grow, this matters more, not less.

Where Specifications Pay Off

We run this approach across our entire portfolio — seven active codebases including mobile apps, authentication services, a knowledge management platform, and this website. Over the past year that's 242 shipped changes tracked through structured specifications, each one with a proposal, a defined scope, and a clear record of what changed and why.

The value becomes obvious at scale. When an AI agent picks up a task on a codebase it last touched weeks ago, the specs tell it what the system does, what was recently changed, and what constraints apply — without re-reading every file or re-inferring conventions from code. A change to the authentication service doesn't accidentally contradict a decision made during the invoice system work three months earlier, because both decisions are captured in the specs, not lost in a closed chat window.

For a simple app, a single prompt is fine. For a portfolio of production systems with years of accumulated decisions and constraints, persistent specifications are the difference between an AI that helps and one that creates expensive rework.

The Bottom Line

AI agents are a genuine force multiplier for software development. But the multiplier works in both directions — they build the right thing faster, or they build the wrong thing faster. The difference is the quality of intent you provide.

You don't need 19 agents simulating a sprint planning meeting. You need three things: a written description of what you're building and why, a set of constraints the agent can check its work against, and persistent files that survive between sessions so decisions don't evaporate overnight.

After filtering out the frameworks that added weight without providing real value, we landed on OpenSpec. Lightweight, open source, designed for existing codebases. It solves the actual problem — AI agents need external memory to stay coherent — without making a simple problem complicated. We also published a VS Code extension to make the workflow easier to adopt.

Böckeler — Understanding SDD: Kiro, spec-kit, and Tessl Van Der Linden — Spec-driven development, Back to the Future?!Thoughtworks — Unpacking SDD as a key AI engineering practice Marmelab — Spec-Driven Development: Waterfall Strikes Back OpenSpec OpenSpec for VS Code by Avant Media

More Insights

CI/CD & QA for AI The AI Reckoning

Building with AI Agents?

We help engineering teams adopt structured AI workflows that scale across production codebases.

Start a Conversation