AI Coding Assistants Expose Process Debt

Outcome focus: Defined a docs-first assistant workflow that turns requirements, pinned stack choices, task slices, review loops, tests, and Git checkpoints into a repeatable way to ship with AI without surrendering architecture control.

The assistant did not invent the mess.

It found the mess.

After several production app builds with Claude, GPT-style coding agents, Cursor, and similar tools, the pattern became impossible to miss. AI did not replace the developer. It replaced the moment where the missing process could stay hidden.

Before AI, an unclear requirement became a hallway question, a Slack thread, or a slow pull request. With AI, the same unclear requirement becomes 11 files of confident code in 90 seconds. The ambiguity was already there. The assistant just made it executable.

One app taught me this the annoying way. I asked for "auth" too broadly. The assistant added a user model, a registration route, password hashing, login, session cookies, form UI, error handling, and a package change. It looked productive. It even passed a narrow test. Then a nearby flow broke because the session shape no longer matched the rest of the app.

The problem was not that the model was useless. The problem was that I had not given it a boundary small enough to succeed inside.

The process that works is less glamorous than the hype cycle:

write the project documents before writing code,
choose a framework instead of inviting architecture improv,
pin versions,
make the assistant explain before it edits,
give it one isolated task,
run tests like a developer,
update docs as the system changes,
commit every safe slice.

This sounds like normal engineering because it is normal engineering. AI makes the boring parts more important, not less.

Tooling Is Converging on Process#

The official docs for modern coding assistants are quietly agreeing with each other.

Anthropic's Claude Code best practices recommend separating exploration, planning, implementation, and commit work. They also emphasize verification criteria, screenshots for UI work, precise file context, small prompts, CLAUDE.md, permission rules, hooks, skills, and aggressive context management.

OpenAI's Codex AGENTS.md guide gives Codex durable project instructions through layered AGENTS.md files. GitHub Copilot repository instructions support repository-wide instructions, path-specific .instructions.md files, and agent instructions. Cursor rules put persistent project guidance in .cursor/rules and also support AGENTS.md.

Different products, same direction: coding assistants need persistent context, scoped tasks, verification, and repo-specific rules.

The industry is not discovering magic prompts. It is rediscovering operating discipline.

The Workflow I Trust#

Here is the loop I now want before letting an assistant make production code changes:

AI-assisted development works when every code change moves through a documented, testable, reversible loop.

The important part is not the diagram. It is the ordering.

Do not ask the model to invent the product, choose the stack, create the architecture, write the feature, update the docs, and judge whether it worked in one pass. That is not acceleration. That is outsourcing all the places where judgment lives.

Start With Documents, Not Code#

The fastest way to make an assistant drift is to begin with a vague product idea and a blank repository.

Before the first implementation prompt, create a small set of documents. They do not need to be perfect. They need to be stable enough that the assistant can be corrected against them.

I like this minimum set:

docs/

docs/
  requirements.md
  user-stories.md
  architecture.md
  conventions.md
  task-plan.md

requirements.md lists what the app must do.

docs/requirements.md

# Requirements
 
## Accounts
- Users can register with email and password.
- Users can log in with email and password.
- Users can log out from the account menu.
- Password reset is out of scope for the first release.
 
## Recipes
- Users can save recipes.
- Users can edit recipe title, ingredients, and steps.
- Users can filter recipes by tag.
 
## Non-goals
- No social login in v1.
- No team accounts in v1.
- No offline sync in v1.

user-stories.md keeps the work grounded in behavior, not components.

docs/user-stories.md

# User Stories
 
- As a returning user, I can log in and see my saved recipes.
- As a new user, I can create an account and land on the empty recipe list.
- As a user editing a recipe, I can save changes without losing my current draft.
- As a user with no recipes, I see an empty state with one clear action.

architecture.md names the shape of the system.

docs/architecture.md

# Architecture
 
## App Shape
- Next.js App Router application.
- Server components by default.
- Client components only for local interactivity.
- Server actions for mutations unless an API route is explicitly needed.
 
## Data
- Prisma owns database access.
- Zod validates external inputs at boundaries.
- UI components do not import Prisma.
 
## Testing
- Unit tests for pure logic.
- Integration tests for server actions and API routes.
- Playwright smoke tests for critical user flows.

conventions.md removes low-grade decision noise.

docs/conventions.md

# Conventions
 
## Naming
- React components use PascalCase.
- Hooks start with `use`.
- Server actions end with `Action`.
- Test files use `.test.ts` or `.test.tsx`.
 
## Files
- Reusable UI lives in `src/components`.
- Route-local components live next to the route that uses them.
- Database code lives in `src/server/db`.
- Shared validation schemas live in `src/server/schemas`.
 
## Assistant Rules
- Only modify files named in the prompt.
- Do not add packages unless the prompt explicitly asks.
- Do not change auth, routing, or database schema as a side effect.
- Return a diff summary and verification commands.

These files look basic. That is their strength.

The assistant does not need a 40-page PRD. It needs a compact operating surface that answers the repeated questions: what are we building, how is the code organized, what is out of scope, and how do we know a change worked?

Choose a Framework and Pin Versions#

Letting a model invent structure from scratch is expensive.

Use a framework that already has opinions: Next.js, SvelteKit, Remix, Expo, Rails, Laravel, Django, Spring Boot, whatever fits the product. The point is not that any one framework is morally superior. The point is that framework defaults prevent the assistant from mixing patterns every third prompt.

If you are building a Next.js app, start from Next.js. If you are building a Svelte app, start from SvelteKit. If you are building mobile, use the actual app framework and project structure instead of asking the model to assemble one from memory.

Then pin versions.

package.json

{
  "engines": {
    "node": "22.15.0"
  },
  "dependencies": {
    "next": "16.2.4",
    "react": "19.2.1",
    "react-dom": "19.2.1",
    "zod": "4.3.2"
  },
  "devDependencies": {
    "typescript": "5.9.3",
    "eslint": "9.38.0",
    "jest": "30.2.0"
  }
}

Version mismatch is where assistants waste whole afternoons. One answer assumes the old router. Another answer assumes the new router. A package API changed. A lint rule moved. A config file format changed. Now the assistant is debugging the consequences of a stack it invented.

Pinning versions is not anti-progress. It is how you make progress inspectable.

The tradeoff is real: fixed versions reduce spontaneous access to the newest API. In exchange, you get reproducibility. For production work, that is usually the better bargain. Upgrade intentionally in its own task, with release notes open and tests around the affected surface.

Make the Assistant Explain Before It Codes#

The cheapest time to catch a bad implementation is before the diff exists.

I almost never want the first response to be code on a medium-sized task. I want the assistant to restate the task, name the files it expects to touch, identify risks, and describe the verification plan.

planning-prompt.md

Read:
- `docs/requirements.md`
- `docs/architecture.md`
- `docs/conventions.md`
- `src/app/(auth)/register/page.tsx`
- `src/server/schemas/auth.ts`
 
Task:
Add server-side validation for registration passwords.
 
Before writing code:
1. Restate the requirement in your own words.
2. List the exact files you expect to modify.
3. Explain the implementation approach.
4. Name the tests or checks you will run.
5. Wait for approval before editing.

If the explanation is wrong, correcting it is cheap.

If the model says it needs to edit middleware.ts, the database schema, and the login route to validate a password field, you caught the problem early. If it says it will add a new validation library when the repo already uses zod, you caught the drift before it became dependency churn.

When you do approve edits, ask for diffs or a narrow patch summary:

Proceed with the smallest diff.
Only modify the files you listed.
Do not change package versions.
After editing, summarize the diff by file and report the verification command output.

The assistant can still be wrong. It is much easier to supervise when it has already committed to an approach.

Slice Features Until They Fit in Your Hand#

"Build auth" is not a task. It is a cluster of tasks wearing a trench coat.

Break it down:

A feature that sounds simple to a human is often several assistant-sized tasks.

Each slice should have one outcome and one verification path.

small-task.md

Task:
Add password hashing to registration.
 
Scope:
- `src/server/auth/password.ts`
- `src/server/auth/password.test.ts`
 
Rules:
- Use the existing package already installed in the repo.
- Do not change the user schema.
- Do not edit registration UI.
- Do not edit login.
 
Acceptance:
- `hashPassword` returns a non-empty hash different from the input.
- `verifyPassword` returns true for the original password.
- `verifyPassword` returns false for a different password.
- Tests pass with `npm test -- password`.

That prompt gives the assistant room to help without giving it room to redesign the system.

The smaller task is not just safer. It is faster to debug. If it fails, you know where to look. If it passes, you can commit it and move to the next slice.

Use Multiple Models, But Give Them Different Jobs#

Using multiple models can help. Using multiple models as a popularity contest usually does not.

The useful pattern is role separation:

one assistant plans,
one assistant implements,
one assistant reviews,
the human decides.

Fresh context helps because the reviewer is not emotionally invested in the code it just wrote. Anthropic's guidance calls out a similar writer/reviewer pattern for multiple Claude sessions. The same idea works across Claude, Codex, Cursor, Copilot, and ChatGPT if you keep the task clear.

A review prompt should not ask, "Is this good?"

Ask for specific failure modes:

review-prompt.md

Review this diff as a production engineer.
 
Look specifically for:
- behavior changes outside the requested scope,
- missing validation,
- inconsistent imports or naming,
- untested edge cases,
- dependency or version changes,
- security-sensitive changes,
- code that is not idempotent.
 
Return:
- blocking issues first,
- file and line references,
- suggested minimal fixes,
- tests that should run.

There is a cost. Multiple model passes take time and tokens. For trivial edits, it is overkill. For auth, billing, data migrations, permissions, or anything with user data, the extra review is cheap insurance.

Keep Documentation Alive#

Documentation that only exists before implementation becomes folklore immediately.

Update the docs as part of the development loop. When a decision changes, write it down. When the assistant creates a pattern you want repeated, capture it. When it makes the same mistake twice, add a rule.

The docs do not need to be long. They need to be current.

docs/task-plan.md

# Task Plan
 
## Done
- Created registration validation schema.
- Added password hashing helpers.
- Added tests for hash and verify behavior.
 
## Current
- Wire registration action to password hashing.
 
## Next
- Add login verification.
- Add logout action.
- Add auth smoke test.
 
## Decisions
- Password reset is out of scope for v1.
- Auth UI should not import database code.
- Registration errors return field-level messages.

After a long assistant session, create a handoff note before starting over:

handoff-prompt.md

Create a handoff note for a fresh session.
 
Include:
- the product goal,
- files changed,
- decisions made,
- tests that passed,
- tests not yet run,
- known risks,
- next task,
- files the next assistant must not edit.
 
Keep it under 500 words.

Long contexts get muddy. A clean session with a current handoff often beats a huge chat history full of failed attempts, corrections, and stale file contents.

Re-Paste Files, Or Make the Agent Re-Read Them#

If you are using a chat-only assistant, paste the current file after every few edits. Not the old file. Not the file as you remember it. The actual file.

If you are using a repo-connected agent, make it inspect the real workspace before editing:

Before proposing changes, inspect the current files with `rg` and `sed`.
Do not rely on prior chat context for file contents.
Then list the exact files you read.

Stale context is one of the quietest sources of AI bugs. The assistant edits against a version of the file that no longer exists, restores an old import, removes a recent fix, or reintroduces a bug you already handled.

Also keep the scope rule explicit:

Only modify these files:
- `src/server/auth/register.ts`
- `src/server/auth/register.test.ts`
 
If another file appears necessary, stop and ask first.

This prevents the classic hidden bug: the assistant fixes one route by changing a shared helper used by six other routes.

Review and Test Like a Developer#

AI can write code. It cannot carry your responsibility for the result.

Review the diff, not the explanation. The explanation is a sales pitch unless the code supports it.

Things I check almost every time:

Did it edit only the intended files?
Did it add imports that do not match local patterns?
Did it change a public type or schema?
Did it alter behavior in adjacent flows?
Did it add a dependency?
Did it weaken validation?
Did it catch and suppress errors instead of fixing root cause?
Did it duplicate logic that already existed?
Did it leave dead code or unused helpers?
Did it update tests for the actual behavior?

Run adjacent tests, not only the new one.

npm run lint
npm run typecheck
npm test -- auth
npm test -- users
npm run build

If the feature touches routing, run route tests. If it touches shared validation, run consumers. If it touches database code, run migration checks and rollback checks. If it touches UI, look at the screen.

The assistant will sometimes make a silent nearby adjustment that seems harmless. A human reviewer has to ask who else depends on that file.

Git Is the Undo Button#

Assistant-specific undo features are useful. Git is the recovery model.

Use small commits:

git switch -c codex/auth-password-validation
git status --short
 
# one assistant task
npm test -- password
git add src/server/auth/password.ts src/server/auth/password.test.ts
git commit -m "Add password hashing helpers"
 
# next assistant task
npm test -- register
git add src/server/auth/register.ts src/server/auth/register.test.ts
git commit -m "Validate registration passwords"

When a change is bad before commit:

git restore --staged .
git restore .

When a committed change is bad:

git revert <commit-sha>

Ask for idempotent fixes when the assistant writes scripts, migrations, or setup code:

Make the fix idempotent.
Running it twice should not create duplicate rows, duplicate config entries, duplicate imports, or conflicting files.
Add a test or guard that proves repeat execution is safe.

Idempotence is a boring word that saves real incidents.

Modular Architecture Is an AI Primitive#

If a tiny change requires the assistant to understand the entire codebase, the codebase is too coupled for AI-assisted work.

That does not mean every project needs enterprise architecture. It means the boundaries should be legible:

UI components should not know database details.
API routes should not duplicate business rules from server actions.
validation schemas should live at boundaries,
shared helpers should have tests,
modules should have names that reveal their job,
side effects should be isolated.

Assistants follow patterns. If the repo has three ways to fetch a user, the model will invent a fourth. If route code, database code, and UI state live in the same file, the model will keep adding to that file because the repo taught it to.

Good architecture is context compression. A module with a clear name and narrow responsibility gives the assistant less to misunderstand.

The Repo Contract I Want#

Every serious AI-assisted repo should have a small contract at the root. The file name can vary by tool: AGENTS.md, CLAUDE.md, .github/copilot-instructions.md, .cursor/rules, or some combination. The content should be boring, specific, and enforceable.

AGENTS.md

# Agent Operating Contract
 
## Product Context
- Read `docs/requirements.md`, `docs/architecture.md`, and `docs/conventions.md`
  before medium or large changes.
 
## Scope
- Make the smallest diff that satisfies the task.
- Only edit files named in the prompt.
- If another file is required, stop and ask first.
- Do not add dependencies unless explicitly requested.
- Do not change framework versions as part of feature work.
 
## Planning
- For non-trivial tasks, restate the task before editing.
- List files you expect to modify.
- Name the verification commands.
 
## Code
- Follow existing naming, folder structure, and test style.
- Prefer existing helpers over new abstractions.
- Keep UI, data access, validation, and side effects separated.
 
## Verification
- Run the narrowest relevant test first.
- Run adjacent tests for shared logic.
- Report command output honestly.
 
## Git
- Keep changes commit-sized.
- Do not rewrite unrelated user changes.
- Do not touch generated files unless the task says so.

This file will not make an assistant perfect. It will make failures easier to identify. When the assistant violates the contract, the correction is clear.

What Actually Ships#

AI is a multiplier.

If the process is good, it multiplies delivery. If the process is vague, it multiplies ambiguity. If the architecture is modular, it moves quickly inside modules. If the architecture is tangled, it spreads changes through the tangle. If tests are meaningful, it gets immediate feedback. If tests are missing, it writes plausible code and waits for production to grade it.

The best developers will not be the ones who paste the biggest prompts.

They will be the ones who can create the smallest safe work unit, give the assistant the right context, reject drift early, test the right surface, and keep the system understandable after the code lands.

So before asking whether AI will replace developers, ask whether your team has a process worth multiplying.

If the answer is no, start there.