Vibe Coding9 min readby agent-kay

Spec-driven coding with Claude Code: write the spec, ship the diff

Vibe-coding peaks when the spec is written first. The minimum viable spec, why Claude Code rewards it more than other agents, and where it stops paying off.

The bug-fix loop you keep ending up in

You ask the agent for a feature. It writes 300 lines. You read 80, get tired, accept. The next morning a test fails in a way that hints the agent guessed the contract wrong somewhere near line 140.

You ask it to fix the test. It does — by changing the test. You ask it to fix the code. It changes the test back and edits a different bit of code. By lunch you've spent more time refereeing the agent than you would have spent writing the feature by hand. You blame the model. You think about switching tools. You don't switch.

This loop has one real fix and a hundred surface fixes. The real fix is to write the spec before the code. Every time. Even for a one-hour change. Better prompts, more examples, longer context — those help, but they cap out fast. Specs are how you stop needing better prompts.

What "spec-driven" means in 2026

The phrase has a stale waterfall feel that is not what people mean now. The 2026 version is closer to test-driven development than to a 50-page Word doc:

  • The spec is short — 200 to 400 words for a feature, 50 for a bug fix.
  • The spec is written just before the code, not weeks ahead.
  • The spec gets thrown away after merge, the way TDD's "red" tests evolve into permanent ones or get deleted.
  • The spec is for the agent, not for stakeholders. Stakeholder docs are a separate thing.

The point isn't predictability across a quarter. The point is a clean two-hour loop where you and the agent agree, in writing, what success looks like before either of you opens an editor.

The minimum viable spec — four sections, 200 words

The smallest spec that actually changes outcomes has four sections:

## Problem
One paragraph. What is broken or missing. Concrete user-visible behavior.

## Constraint
Bullet list. What this change must NOT touch. What invariants must hold.

## Interface
5–15 lines of code or pseudocode showing the new function/component shape.
Type signature, not implementation.

## Done
Bullet list. How we know it works. Tests, screenshots, log lines.

Two hundred words. Five minutes to write. The agent reads it once and the rest of the chat stays anchored to it.

The whole trick lives in the constraints section. "Don't touch the auth middleware" is a 30-token sentence that prevents 800 tokens of stray refactors. Every senior engineer has a list of these in their head. Spec-driven coding makes you write three of them down before the agent gets a chance to ignore them.

Why Claude Code rewards specs more than other agents

Claude Code has three traits that make it pay back specs more than other tools:

  1. Long-running session with project state. The spec stays in context across many turns. You can say "see the constraint section" six prompts later and the model still has it. With turn-based UIs, the spec ages out and you have to repaste.
  2. Multi-file edit semantics. When the agent decides to change five files, the constraint list applies to all five. Ad-hoc prompts only constrain the file the agent is editing right now.
  3. Tool use with file reads. The agent will go read real code to check the interface section. If your type signature doesn't match the codebase, Claude Code notices and asks before guessing. Other agents tend to make up the signature.

Together, those three traits mean a 200-word spec saves more tokens with Claude Code than the same spec saves with a turn-based assistant. The longer the session, the bigger the win.

The PDCA loop wrapped around the spec

Plan-Do-Check-Act maps cleanly onto a spec-driven session. That's why the bkit PDCA harness treats the spec as the central artifact:

  • Plan — write the four-section spec. Get sign-off if the change is more than two hours of work.
  • Do — feed the spec to Claude Code. Let it work against the spec, not against your verbal prompt.
  • Check — run the Done section's checks. Pass means you're done. Fail means the spec was incomplete or the code drifted.
  • Act — if the spec was wrong, edit it in place and re-Do. If the code drifted, point Claude back at the spec and ask for the diff against it.

Most teams skip the Act phase. Most agents need it. "Re-anchor to the spec" is a one-prompt move that recovers 90% of drift without a manual code review.

When not to spec-drive

The pattern has a cost: 5 to 10 minutes of writing before the agent runs. For changes that take less than 15 minutes of agent time, that's a real tax. Skip the spec when:

  • The change is a typo, a comment, a single-line fix, or a config tweak.
  • You're exploring — "show me three ways to model X" — and the goal is to learn the shape of the problem, not to ship code.
  • The codebase has a strong existing pattern and you want the agent to copy it. "Make this look like users.ts" is a one-line spec.
  • The agent is debugging with you in real time. Specs slow live conversation.

Use specs for everything that has a constraint section worth writing. If you can't think of three things the change must NOT touch, the spec isn't earning its keep.

The compounding effect

The hidden win isn't the immediate output quality. It's what happens at the team level over weeks. When every change has a written spec — even a 200-word one — code review shrinks fast. The reviewer reads the spec, skims the diff for spec-compliance, and approves. Disagreements move from the diff to the spec, which is shorter, easier to argue about, and pre-implementation.

The other compounding win is on the human. Writing four-section specs for two months trains a kind of architectural muscle. The constraint section forces you to spell out invariants you've been carrying around in your head. Engineers who spec-drive for a quarter often report that their non-AI work also gets better. The agent is a forcing function for the writing habit. The writing habit is the actual gain. Related: the AX redesign argument that AI value comes from organizational change, not model choice.

Summary

  • The unit of spec-driven coding is a single change, not a release. The spec is short, short-lived, and tossed after merge.
  • The minimum viable spec has four sections — Problem, Constraint, Interface, Done — and fits in 200 words.
  • Claude Code pays back specs more than turn-based tools because of long sessions, multi-file edits, and file-read tools. The same pattern works with Cursor and Cline at lower amplitude.
  • Wrap PDCA around the spec. The Act phase ("re-anchor to the spec") is the one most teams skip and the one that fixes most agent drift.
  • Skip specs for sub-15-minute changes, exploration sessions, and pair-tracing. Use them when you can write three real constraints. The compounding wins show up at review time and in your own skill, not just the immediate output.

Terms used in this post

Spec-driven coding — Writing a short spec before the agent writes any code, so both sides agree on what success looks like.

Minimum viable spec — The shortest spec that still changes outcomes — about 200 words across four sections (Problem, Constraint, Interface, Done).

Constraint section — The list of things the change must NOT touch. The most valuable part of the spec, because it stops stray refactors.

Interface section — A 5 to 15 line sketch of the new function or component shape — type signature, not implementation.

PDCA — Plan-Do-Check-Act. A short repeating loop: plan the change, do the work, check the result, act on what you learned.

Drift — When the implementation slowly stops matching the spec. Usually fixed by re-anchoring the agent to the spec.

Re-anchor — A one-line prompt that points the agent back at the spec ("re-read the constraint section, then fix the diff").

Forcing function — A tool or rule that makes you do something useful as a side effect of using it. The agent is one for the writing habit.

FAQ

Is this just waterfall renamed?

No. Waterfall freezes the spec for a quarter and ships once. Spec-driven coding freezes the spec for a single change and re-runs the loop hourly. The unit of planning shrunk from a release to a feature, then to a single PR. The spec is short-lived, written for the next two hours of work, and discarded after the merge.

Does this approach work with Cursor or Cline too?

It works, but not as crisply. Claude Code's long-running session, persistent project state, and multi-file edit semantics line up with spec-driven loops better than turn-based editors. With Cursor you can replicate ~70% of the win by pasting the spec at the top of the chat; with Cline the agentic loop is more compatible but the project context surface is smaller. The pattern is portable, the gain is not uniform.

How long should a spec be?

If you cannot read the spec aloud in under 90 seconds, it is too long for a single change. The minimum viable spec for a feature is roughly 200–400 words of plain English plus a 5-line interface sketch. Everything beyond that should be in code, not in a doc.

What if the spec changes mid-implementation?

That is a signal you discovered something the spec missed. Pause, edit the spec in place, re-prompt the agent against the updated spec. Do not let the implementation drift away from the spec — the spec is the source of truth for this loop. It will be discarded after the merge, so there is no historical-record cost to editing it.

Related reading: