Writing Specs for AI Coding Agents
Bonus topic. Return to: What We’ve Built or explore React Architecture.
The Setup
Section titled “The Setup”You paste your feature requirements into an AI coding agent. It generates 500 lines of code. Half doesn’t compile. The other half uses patterns your team abandoned six months ago. You spend two hours fixing what should have taken twenty minutes.
The problem isn’t the AI. It’s the spec.
Most developers treat AI agents like magic boxes: dump in requirements, pray for good output. But these tools have fundamental constraints: finite context windows, no persistent memory between sessions, and a tendency to confidently generate plausible-looking nonsense when given vague instructions.
The solution isn’t more sophisticated AI. It’s better specs.
The Core Tension
Section titled “The Core Tension”Two forces pull in opposite directions:
Comprehensive Coverage - You want the AI to know everything: your architecture, patterns, conventions, history, constraints, edge cases, team preferences…
Context Limits - AI agents have finite memory. Dump too much in, and the model loses focus. Key details drown in noise.
The instinct is to write a massive spec covering every possible scenario. This fails predictably:
// The Overload Approach
10,000 lines of context
├── Full architecture documentation
├── Complete style guide
├── Every past decision
├── All possible edge cases
└── Kitchen sink
Result: Agent overwhelmed, key details lost, generic outputThe solution: structured, modular specs that give the agent exactly what it needs for each task.
graph LR
A[Monolithic Spec<br/>Everything at once] -->|Overwhelms| B[Lost Focus<br/>Generic output]
C[Modular Spec<br/>Task-relevant context] -->|Enables| D[Precise Output<br/>Patterns followed]
style A fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style B fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
style C fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style D fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
linkStyle 0 stroke:#0f172a,stroke-width:3px
linkStyle 1 stroke:#0f172a,stroke-width:3px
Plan Mode: Vision First, Details Second
Section titled “Plan Mode: Vision First, Details Second”The most common mistake: diving straight into implementation details.
// BAD: Implementation-first prompt
// "Create a UserService class with methods for CRUD operations,
// using Prisma for the database, with retry logic, logging,
// validation using Zod, error handling with Result types,
// dependency injection, and full test coverage..."This overwhelms the agent with decisions before establishing direction. The AI makes arbitrary choices, and you spend time fixing them.
The Two-Phase Approach
Section titled “The Two-Phase Approach”Phase 1: Vision - Describe what you’re building and why. Let the AI draft a plan.
// GOOD: Vision-first prompt
// "I need a user management feature that handles registration,
// profile updates, and account deletion. Users should receive
// email confirmations. Draft a plan for how you'd structure this."Phase 2: Execution - Review the plan, correct course, then implement piece by piece.
// After reviewing the plan:
// "The plan looks good. Start with createUser following our
// fn(args, deps) pattern. Here's an example from the codebase..."graph LR
A[High-level Vision<br/>What and Why] --> B[AI Drafts Plan<br/>Structure proposal]
B --> C[Human Reviews<br/>Correct direction]
C --> D[AI Implements<br/>Focused execution]
D --> E[Human Validates<br/>Quality check]
E -->|Iterate| D
style A fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style B fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style C fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
style D fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style E fill:#e2e8f0,stroke:#0f172a,stroke-width:2px,color:#0f172a
linkStyle 0 stroke:#0f172a,stroke-width:3px
linkStyle 1 stroke:#0f172a,stroke-width:3px
linkStyle 2 stroke:#0f172a,stroke-width:3px
linkStyle 3 stroke:#0f172a,stroke-width:3px
linkStyle 4 stroke:#0f172a,stroke-width:3px
Why This Works
Section titled “Why This Works”AI agents excel at elaboration when given clear direction. They struggle when forced to make high-level decisions with incomplete context.
By establishing vision first:
- The AI understands the goal, not just the mechanics
- You catch misalignment before code is written
- The plan becomes a reference for subsequent tasks
- Iteration happens at the design level, not the implementation level
Practical tip: Most AI coding tools have a “Plan Mode” or equivalent that restricts the agent to read-only operations. Use this for Phase 1. Let the agent explore your codebase and draft a plan before touching any files.
The Six Core Areas
Section titled “The Six Core Areas”A good agent spec isn’t a novel. It’s a structured document covering specific areas. Analysis of thousands of effective agent configurations reveals a consistent pattern.
| Area | Purpose | Example |
|---|---|---|
| Commands | Available actions | npm test, pnpm build, pnpm lint --fix |
| Testing | How to verify code | ”Use vitest, vitest-mock-extended for mocks” |
| Project Structure | Where things live | ”Domain logic in src/domain/, handlers in src/api/” |
| Code Style | Patterns to follow | ”Use fn(args, deps), return Result<T, E>” |
| Git Workflow | Version control practices | ”Conventional commits, feature branches” |
| Boundaries | What’s allowed | ”Never commit secrets, ask before adding deps” |
Commands: Executable, Not Descriptive
Section titled “Commands: Executable, Not Descriptive”Don’t write “run the tests.” Write the exact command:
## Commands
- **Build**: `pnpm build` - Compiles TypeScript to dist/
- **Test**: `pnpm test` - Runs vitest, must pass before commits
- **Lint**: `pnpm lint --fix` - ESLint with auto-fix
- **Type check**: `pnpm tsc --noEmit` - Verify types without emittingAgents need executable commands, not vague instructions.
Testing: Framework and Patterns
Section titled “Testing: Framework and Patterns”Specify not just what tool, but how to use it:
## Testing
- Framework: vitest
- Mocks: vitest-mock-extended (typed mocks, no vi.mock)
- Unit tests: `*.test.ts` alongside source files
- Integration tests: `*.test.int.ts` in test/ directory
- Coverage: Not required, but test error paths explicitlyProject Structure: Navigation Map
Section titled “Project Structure: Navigation Map”## Project Structure
src/
├── domain/ # Pure business logic, no I/O imports
├── infra/ # Database, external APIs, implementations
├── api/ # HTTP handlers, composition root
└── lib/ # Shared utilities (Result, validation)
tests/
├── unit/ # Fast tests, mocked dependencies
└── integration/ # Real database, real servicesCode Style: Show, Don’t Just Tell
Section titled “Code Style: Show, Don’t Just Tell”One code example beats three paragraphs of description:
## Code Style
Functions follow `fn(args, deps)`:
\`\`\`typescript
async function createUser(
args: { email: Email; name: string },
deps: { db: Database; logger: Logger }
): Promise<Result<User, CreateUserError>> {
// Business logic only - no framework imports
}
\`\`\`
- Args: Validated input (branded types preferred)
- Deps: Injected infrastructure
- Returns: Result<T, E>, never throwsGit Workflow: Commit Conventions
Section titled “Git Workflow: Commit Conventions”## Git Workflow
- Commits: Conventional format (feat:, fix:, refactor:, test:, docs:)
- Branches: feature/*, fix/*, refactor/*
- PRs: Include summary, test plan, breaking changes if any
- Never: Force push to main, commit without tests passingBoundaries: The Critical Section
Section titled “Boundaries: The Critical Section”This is where most specs fail. Covered in depth in the next section.
Three-Tier Boundaries
Section titled “Three-Tier Boundaries”The difference between a helpful AI agent and a chaotic one: clear boundaries.
Most specs either have no boundaries (agent does whatever it wants) or vague ones (“follow best practices”). Neither works.
The solution: three-tier boundaries that explicitly define autonomy levels.
graph TD
subgraph Always["ALWAYS - Autonomous"]
A1[Follow code patterns]
A2[Write tests]
A3[Use Result types]
A4[Validate with Zod]
end
subgraph Ask["ASK FIRST - Collaborative"]
B1[Add dependencies]
B2[Change APIs]
B3[Modify schemas]
B4[Refactor outside scope]
end
subgraph Never["NEVER - Forbidden"]
C1[Commit secrets]
C2[Skip tests]
C3[Force push]
C4[Use @ts-ignore]
end
style Always fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style Ask fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style Never fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
ALWAYS (Autonomous)
Section titled “ALWAYS (Autonomous)”Actions the agent should take without asking. These are non-negotiable patterns:
### ALWAYS
- Follow the `fn(args, deps)` pattern for all functions
- Write unit tests alongside implementation
- Use `Result<T, E>` for error handling, never throw
- Validate inputs with Zod at boundaries
- Use branded types for domain values (Email, UserId, etc.)
- Run `pnpm lint` and `pnpm test` before presenting code
- Add JSDoc comments for exported functionsASK FIRST (Collaborative)
Section titled “ASK FIRST (Collaborative)”Actions that require human approval. These have higher impact or ambiguity:
### ASK FIRST
- Creating new database tables or migrations
- Adding npm dependencies
- Changing public API contracts
- Modifying authentication/authorization logic
- Refactoring code outside the current task scope
- Altering shared configuration files
- Making architectural decisions not covered by existing patternsNEVER (Forbidden)
Section titled “NEVER (Forbidden)”Hard stops. These should never happen regardless of context:
### NEVER
- Commit `.env` files or any secrets
- Use `vi.mock()` for application code
- Skip tests to "ship faster"
- Force push to protected branches
- Delete production data or migrations
- Use `@ts-ignore` or `@ts-expect-error`
- Suppress ESLint errors without explanation
- Use `any` type without explicit justificationWhy Three Tiers Work
Section titled “Why Three Tiers Work”| Tier | Speed | Risk | Human Involvement |
|---|---|---|---|
| Always | Fast | Low | None |
| Ask First | Medium | Medium | Approval only |
| Never | N/A | High | Blocked |
This structure gives the agent maximum autonomy on safe operations while preventing costly mistakes. The “Ask First” tier is key: it enables progress without requiring constant supervision.
Modular Prompts and Context Management
Section titled “Modular Prompts and Context Management”AI agents have finite context windows. Every token you spend on irrelevant information is a token not available for the actual task.
The Anti-Pattern
Section titled “The Anti-Pattern”"Build me a complete e-commerce checkout system with cart management,
payment processing, inventory tracking, email notifications, order
history, refund handling, and analytics integration."This prompt asks for everything at once. The agent will:
- Make arbitrary decisions without guidance
- Forget earlier requirements by the time it reaches later ones
- Produce generic code that doesn’t fit your architecture
The Pattern: Task Decomposition
Section titled “The Pattern: Task Decomposition”Break complex features into focused tasks, each with its own context:
graph TD
A[Feature: E-commerce Checkout] --> B[Task 1: Cart Management]
A --> C[Task 2: Payment Processing]
A --> D[Task 3: Order Recording]
A --> E[Task 4: Notifications]
B --> B1[createCart]
B --> B2[addItem]
B --> B3[removeItem]
C --> C1[processPayment]
C --> C2[handleRefund]
D --> D1[createOrder]
D --> D2[getOrderHistory]
E --> E1[sendConfirmation]
E --> E2[sendShipping]
style A fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style B fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style C fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style D fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style E fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style B1 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style B2 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style B3 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style C1 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style C2 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style D1 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style D2 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style E1 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
style E2 fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
Each task gets a focused prompt with only relevant context.
Context Budget Strategy
Section titled “Context Budget Strategy”Think of context like a budget. Allocate it deliberately:
| Category | Allocation | Contents |
|---|---|---|
| Current Task | 40% | Specific requirements, acceptance criteria |
| Relevant Patterns | 30% | Examples of similar code in your codebase |
| Boundaries | 20% | Rules, constraints, style requirements |
| History | 10% | Previous decisions relevant to this task |
For implementing createOrder:
## Current Task (40%)
Create the `createOrder` function that:
- Takes validated order items and customer ID
- Checks inventory availability
- Creates order record in database
- Returns Result<Order, CreateOrderError>
## Relevant Patterns (30%)
Here's how we implemented `createUser`:
[paste createUser.ts]
## Boundaries (20%)
- Use fn(args, deps) signature
- Return Result, never throw
- Add unit tests
## History (10%)
We decided against soft deletes for orders (see ADR-007).When to Start Fresh
Section titled “When to Start Fresh”Long conversations accumulate context that may no longer be relevant. Signs you need a fresh session:
- Switching to a different feature area
- Agent starts confusing current task with earlier ones
- Output quality degrades despite clear prompts
- You’ve significantly changed direction from the original plan
Don’t be afraid to start new sessions. The spec file persists between sessions, so the agent can pick up context from there.
Self-Verification and Validation
Section titled “Self-Verification and Validation”Don’t trust AI output blindly. Build verification into your spec.
The Verification Loop
Section titled “The Verification Loop”graph LR
A[Generate Code] --> B[Lint Check]
B --> C{Pass?}
C -->|No| A
C -->|Yes| D[Type Check]
D --> E{Pass?}
E -->|No| A
E -->|Yes| F[Test]
F --> G{Pass?}
G -->|No| A
G -->|Yes| H[Ready for Review]
style A fill:#475569,stroke:#0f172a,stroke-width:2px,color:#fff
style B fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style C fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
style D fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style E fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
style F fill:#64748b,stroke:#0f172a,stroke-width:2px,color:#fff
style G fill:#94a3b8,stroke:#0f172a,stroke-width:2px,color:#0f172a
style H fill:#cbd5e1,stroke:#0f172a,stroke-width:2px,color:#0f172a
linkStyle 2 stroke:#0f172a,stroke-width:3px
linkStyle 4 stroke:#0f172a,stroke-width:3px
linkStyle 6 stroke:#0f172a,stroke-width:3px
Spec the Self-Checks
Section titled “Spec the Self-Checks”## Verification Requirements
Before presenting any code as complete:
1. **Compile**: `pnpm tsc --noEmit` must pass
2. **Lint**: `pnpm lint` must pass with no errors
3. **Test**: `pnpm test` must pass for affected files
4. **Pattern check**: Verify against code style:
- Functions use `fn(args, deps)` signature
- Errors return `Result<T, E>`
- Input validation uses Zod at boundaries
If any check fails, fix the issue before presenting.
Do not present code with known issues "for review."Human Oversight Remains Essential
Section titled “Human Oversight Remains Essential”Self-verification catches mechanical errors. It doesn’t catch:
- Incorrect business logic
- Security vulnerabilities
- Architectural violations
- Subtle bugs that pass tests
Rule: Never commit AI-generated code you couldn’t explain to a colleague. If you can’t trace the logic, don’t ship it.
Spec-Driven Development Phases
Section titled “Spec-Driven Development Phases”Working effectively with AI agents is a skill that develops over time. Here’s a typical progression:
| Phase | Focus | Human Role | Agent Role |
|---|---|---|---|
| Foundation | Create the spec | Define patterns, boundaries | Suggest structure |
| Iteration | Refine from feedback | Review output, correct mistakes | Learn from corrections |
| Automation | Consistent execution | Spot-check, handle edge cases | Execute routine tasks |
| Evolution | Spec maintenance | Update when patterns change | Flag outdated guidance |
Phase 1: Foundation
Section titled “Phase 1: Foundation”Create your initial spec. This takes time upfront but pays dividends:
- Define your code patterns with examples
- Establish boundaries (Always/Ask/Never)
- Document project structure
- Add relevant commands
Phase 2: Iteration
Section titled “Phase 2: Iteration”Use the spec, observe failures, and refine:
- Agent keeps using wrong pattern? Add an explicit example
- Agent asks unnecessary questions? Move to “Always” tier
- Agent makes dangerous changes? Add to “Never” tier
Phase 3: Automation
Section titled “Phase 3: Automation”With a mature spec, routine tasks become predictable:
- Feature implementation follows established patterns
- Tests are written consistently
- Code style is maintained automatically
Phase 4: Evolution
Section titled “Phase 4: Evolution”Specs aren’t static. Update when:
- Team adopts new patterns
- Dependencies change
- Architecture evolves
- Lessons learned from incidents
Treat your spec like code: version control it, review changes, keep it current.
Practical Example: Complete Spec Template
Section titled “Practical Example: Complete Spec Template”Here’s a complete, copy-paste-ready spec file integrating all patterns from this documentation series.
# Project: [Your Project Name]
## Tech Stack
- TypeScript 5.x with strict mode
- Node.js 24+
- Vitest for testing
- Zod for validation
- awaitly for Result types and workflows
- autotel for tracing
## Commands
- **Build**: `pnpm build`
- **Test**: `pnpm test`
- **Lint**: `pnpm lint --fix`
- **Type check**: `pnpm tsc --noEmit`
- **Dev**: `pnpm dev`
## Project Structure
src/
├── domain/ # Pure business logic (fn(args, deps) functions)
│ ├── user/
│ │ ├── createUser.ts
│ │ ├── createUser.test.ts
│ │ └── types.ts
│ └── order/
├── infra/ # External services, database implementations
│ ├── database.ts
│ ├── logger.ts
│ └── types.ts
├── api/ # HTTP handlers, composition root
│ └── routes/
└── lib/ # Shared utilities
├── result.ts
└── validation.ts
## Code Patterns
### Functions
All functions follow `fn(args, deps)`:
\`\`\`typescript
async function createOrder(
args: { customerId: CustomerId; items: OrderItem[] },
deps: { db: Database; logger: Logger }
): Promise<Result<Order, CreateOrderError>> {
// Business logic only
}
\`\`\`
### Error Handling
- Use `Result<T, E>` from awaitly
- Never throw for expected failures
- Discriminated unions for error types:
\`\`\`typescript
type CreateOrderError =
| { type: 'CUSTOMER_NOT_FOUND'; customerId: string }
| { type: 'ITEM_OUT_OF_STOCK'; itemId: string }
| { type: 'DB_ERROR'; cause: unknown };
\`\`\`
### Validation
- Zod schemas at boundaries (HTTP handlers)
- Branded types for domain values:
\`\`\`typescript
const CustomerIdSchema = z.string().uuid().brand<'CustomerId'>();
type CustomerId = z.infer<typeof CustomerIdSchema>;
\`\`\`
- Parse, don't validate: functions receive branded types, not raw strings
### Testing
- Unit tests: `*.test.ts` alongside source
- Use vitest-mock-extended for typed mocks
- No `vi.mock()` for application code
- Test each error path explicitly
\`\`\`typescript
import { mock } from 'vitest-mock-extended';
const mockDb = mock<Database>();
mockDb.findCustomer.mockResolvedValue(null);
const result = await createOrder(args, { db: mockDb, logger: mockLogger });
expect(result).toEqual(err({ type: 'CUSTOMER_NOT_FOUND', customerId: args.customerId }));
\`\`\`
## Boundaries
### ALWAYS
- Follow `fn(args, deps)` for all new functions
- Write unit tests alongside implementation
- Use `Result<T, E>`, never throw for expected errors
- Validate inputs with Zod at boundaries
- Use branded types for domain identifiers
- Run `pnpm lint` and `pnpm test` before presenting code
- Add JSDoc for exported functions
### ASK FIRST
- Creating new database tables or migrations
- Adding npm dependencies
- Changing public API contracts or types
- Modifying authentication logic
- Refactoring outside current task scope
- Changing shared configuration
### NEVER
- Commit `.env` files or secrets
- Use `vi.mock()` for application code
- Skip tests
- Force push to protected branches
- Use `@ts-ignore` or `any` without justification
- Delete data without soft-delete pattern
## Git Workflow
- Commits: Conventional format (feat:, fix:, refactor:)
- Branches: feature/*, fix/*
- Always: Run tests before commit
## Verification
Before presenting code as complete:
1. `pnpm tsc --noEmit` - Types pass
2. `pnpm lint` - No lint errors
3. `pnpm test` - Tests pass
4. Pattern check - fn(args, deps), Result types, Zod validation
## Current Task
[Updated per task - describe specific requirements here]The Rules
Section titled “The Rules”-
Vision first, details second. Use Plan Mode. Let the AI draft structure before implementing. Catch misalignment at design time, not debug time.
-
Modular beats monolithic. Break features into focused tasks. One task, one context, one goal. Don’t dump everything into one prompt.
-
Define boundaries explicitly. Always/Ask First/Never tiers prevent autonomous chaos while enabling efficient execution on safe operations.
-
Build verification into the spec. Self-checks (lint, test, types) catch mechanical errors before human review. But mechanical checks don’t replace human oversight.
-
Context is finite; spend it wisely. 40% current task, 30% relevant patterns, 20% boundaries, 10% history. Cut what doesn’t serve the immediate goal.
-
Evolve the spec continuously. Treat it like code: version control, review changes, update when patterns change. A stale spec produces stale output.
Quick Reference
Section titled “Quick Reference”| Aspect | Anti-Pattern | Pattern |
|---|---|---|
| Scope | Everything in one prompt | Modular, focused tasks |
| Detail | Full implementation spec upfront | Vision first, details second |
| Constraints | None or vague | Three-tier boundaries |
| Verification | Trust agent output | Built-in self-checks |
| Context | Dump everything | Strategic 40/30/20/10 allocation |
| Evolution | Static spec | Continuous iteration |
What Makes This Different
Section titled “What Makes This Different”Traditional development specs describe what to build. AI agent specs also describe how to build - the patterns, constraints, and verification that keep output aligned with your architecture.
The patterns in this guide mirror the patterns throughout this documentation series:
- fn(args, deps) makes dependencies explicit for both humans and AI
- Result types force error handling to be visible in the spec
- Zod validation creates clear boundaries the AI can respect
- Explicit testing patterns enable self-verification
A good spec doesn’t just tell the AI what to build. It teaches the AI how your team builds software.
This is a bonus topic. Return to: What We’ve Built
Or explore: React Architecture for framework-specific patterns.