DDD-First AI Prototyping: Making It Work with Roles and Scaffolding
This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.
- Intended readers: People who want to design a workflow where AI builds prototypes on production-grade tech stacks (Next.js / Spring Boot / Rails, etc.) and engineers later harden them for production. Written with both engineers and business-side stakeholders in mind.
- Prerequisites: Familiarity with basic DDD (Domain-Driven Design) concepts helps. If you don’t have it, the necessary parts are explained in the body.
- Reading time: About 14 minutes.
Overview
In an era when AI can write code at high speed, how do you succeed at projects with DDD-first prototyping—what do you prepare, who do you hand it to, and how do you set things up? This article answers that question.
The answer is a combination of three things: method, people, and prep work. Define the domain first, hand it to AI, and have AI build the prototype on a production-grade tech stack—that’s the method. Separate the people who build, the people who take over for production, and the people who set up the scaffolding—that’s the people design. Set up an environment where someone close to a layperson can still drive it safely, using AGENTS.md, domain type skeletons, templates, MCP, and Skills—that’s the prep work.
Get the order wrong and you crash. If you build a working prototype first and try to clean up the domain afterward, the fluent vocabulary AI picked and the statistically plausible business rules it inferred fossilize into your codebase. The reported case of an agent that confused booking with reservation and almost sent the CFO to prison1 is a vivid illustration of what “AI prototypes without a domain” can do at the business level.
Reverse the order—define the domain with DDD first, then hand that definition to AI to build the prototype—and the structure changes. AI shifts from being “the role that infers business from scratch” to “the role that implements a defined business.” The speed of seeing something working is preserved, while humans keep control of domain accuracy.
This article is structured in four parts: what to hand to AI (the five elements of a domain definition), how to hand it over (four implementation patterns), who does the work (three roles plus a hidden fourth), and what to set up beforehand (scaffolding that lowers the skill bar for the prototype builder). At the end, we organize the pitfalls specific to this method.
Note that the workflows of “build a working prototype first and tidy up the domain afterward,” general code quality and security issues with AI-generated code, and the problem of customers treating prototypes as production are all separate topics—they are out of scope here and deferred to other articles.
Why Order Is Critically Important
The value of “showing a working prototype fast” is well established. A 2025 empirical study in requirements engineering reported that 58.2% of practitioners were already using AI in requirements work, and most of them operated it as Human-AI Collaboration rather than full automation2. Waiting two weeks for a 20-page PRD is slower than showing something running on the spot and agreeing on the details. Structurally, this is correct.
The problem starts after that. If you build a working prototype first, the vocabulary and business rules AI chose become a fait accompli.
Khalili’s 2025 essay sums it up neatly3:
DDD encodes intent; AI discovers correlation.
LLMs are generalists. From their training data, they extract statistical correlations like “booking ≒ reservation.” But in real business, “booking” in one context means a sale that has been recognized, while “reservation” in another context means a tentative hold—and that distinction can be the difference between legal compliance and disaster at the regulatory, accounting, and contractual level. The AI agent Russ Miles describes started operating without holding that distinction, and as a result was on the verge of processing actions that could have sent the CFO to prison1. The story is likely embellished as a parable, but it illustrates the structure: when AI steps into business operations without domain boundaries, fluent malfunctions can surface as real damage at the business level.
Reversing the order cuts off the path by which this accident happens. With the domain definition in place first, AI becomes “the role that implements a defined business,” not “the role that infers business.” Humans hold the intent; AI is not given the job of discovering correlations.
This is not armchair theorizing. DDD Academy formalized “LLM × Strategic Design” as a workshop offering for 20264, and Aardling—a DDD consultancy founded by Mathias Verraes in 2020, now led by CEO Thomas Coopman and closely tied to the DDD Europe community—has packaged a curriculum that “uses LLMs as assistants at each stage of domain mapping, refining the ubiquitous language, defining boundaries, and integrating design.” This is the direction the community is collectively moving in.
What to Hand to AI—The Five Elements of a Domain Definition
Here are the five minimum elements you should hand over as a domain definition when you have AI build a prototype on a production-grade stack.
1. Ubiquitous Language (Glossary and Prohibitions)
The most basic deliverable, and often the most omitted.
1
2
3
4
5
6
7
[Minimum unit of a glossary entry]
- Term: Reservation
- Context: Customer Reservation Context
- Meaning: State where the customer's tentative hold has been accepted. Before confirmation.
- Related states: PendingConfirmation, Confirmed, Cancelled
- Prohibition: Do not use "Reservation" as a synonym of "confirmed booking."
After confirmation, switch to the term Booking.
Hand this to AI, and the variable names, type names, and table names in the generated code stop drifting. Without it, AI picks synonyms like “Reservation,” “Booking,” “Order,” and “Appointment” from the distribution of its training data and uses them inconsistently across contexts. Trying to unify them afterward is enormous in scope, because naming penetrates every layer.
Liu Shangqi of Thoughtworks makes this explicit in his discussion of Spec-Driven Development: “specs should use domain-oriented ubiquitous language to describe business intent rather than specific tech-bound implementations”5. The specs you hand AI should be expressed in domain vocabulary that captures business intent—this is becoming a foundational principle of current AI-assisted development.
2. Domain Model (Aggregate / Entity / Value Object)
If ubiquitous language is the “words,” the domain model is the “grammar.”
- Which objects have identity (Entity)
- Which objects are the value itself (Value Object)
- Which objects mark consistency boundaries (Aggregate Root)
- Their relationships (reference, ownership, participation)
Hand these over as both diagrams and type definitions. Diagrams are easy for humans and AI alike to interpret; type definitions serve as anchors at code-generation time. Concretely, you finalize the domain model first—interface and type in TypeScript, record and class in Java, dataclass or Pydantic in Python—and hand it to AI.
This is where the “build the prototype with production tech” assumption pays off. Code generated by dedicated prototyping tools is tied to proprietary runtimes and tends to require rewriting at handover, but with production tech the domain types are production assets from the start. Even if the prototype implementation is thrown away, the domain types remain.
3. Bounded Context and Context Map
Bounded Context is one of the most misunderstood concepts in DDD. Treating it as “the unit of microservices” misses the essence. The real purpose is to define the scope of meaning.
When passing this to AI as context, you give it in this form:
| What you hand over | Example |
|---|---|
| Context name | Customer Reservation Context |
| UL in that context | Reservation (tentative hold), Confirmation (confirmed), etc. |
| Relation to adjacent contexts | Customer/Supplier with the Inventory context; Conformist with the Payment context |
| Anti-Corruption Layer location | ACL placed at the Inventory boundary |
Khalili calls Bounded Context a “cognitive firewall”3. The probability that AI will be pulled toward the generic usage in its training data and cause semantic drift is high. By splitting vocabulary and business rules per context, you stop the drift physically. If you make AI’s file split and module split align with these contexts, you can verify the boundaries later much more easily.
4. Business Invariants
“A reservation cannot hold two seats at once.” “The total of an invoice equals the sum of its line items.” “Unconfirmed orders cannot be shipped.” Whether you hand these business invariants to AI or not produces an order-of-magnitude difference in generated code quality.
There are three layers for handing them over.
Layer 1: Describe in natural language. List invariants as bullet points in a spec or AGENTS.md. The minimum acceptable level.
Layer 2: Express as types. Represent state transitions as sum types (TypeScript’s discriminated union, Rust’s enum, Java’s sealed class) so that invalid states cannot even be written.
1
2
3
4
type Reservation =
| { status: "PendingConfirmation"; reservedAt: Date }
| { status: "Confirmed"; confirmedAt: Date }
| { status: "Cancelled"; cancelledAt: Date; reason: string };
Layer 3: Pass them as property-based tests. Write properties of the form “this invariant holds for any input” as property-based tests (in the QuickCheck family) and hand those over. Run the code AI generates, and reject anything that breaks the property as an invariant violation.
Combining layers 2 and 3 structurally prevents logical defects rooted in business logic—the kind that creep into AI-generated code. Inconsistencies that touch the domain directly are guarded not by human discipline but by the type system and tests.
5. Use Case Specification (Given/When/Then)
BDD’s Given/When/Then format is an extremely good fit for handing business scenarios to AI.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Feature: Confirmation from a tentative hold
Scenario: Can confirm if inventory is available
Given Customer C has placed a tentative hold on Product P
And Product P has inventory of 1 or more
When Customer C performs the confirmation action
Then The tentative hold transitions to Confirmed
And Product P's inventory decreases by 1
Scenario: Cannot confirm without inventory
Given Customer C has placed a tentative hold on Product P
And Product P has inventory of 0
When Customer C performs the confirmation action
Then The tentative hold remains in PendingConfirmation
And The domain event StockShortage is emitted
Hand this over, and you can write tests first. You can have AI write tests and then generate implementations from them. The correctness of the implementation is guaranteed by the spec—the TDD/BDD structure carries over intact into AI assistance.
Liu Shangqi’s Spec-Driven Development is exactly the systematization of this direction5. He explicitly states that “specs should still use domain-oriented ubiquitous language to describe business intent (…) with a clear structure, with a common style to define scenarios using Given/When/Then.”
How to Hand It to AI—Four Implementation Patterns
Knowing “what to hand over” is not enough. Get “how to hand it over” wrong, and AI ignores it. There are four implementation patterns.
flowchart TB
A["Domain Definition<br>(UL/Model/BC/Invariants/UseCase)"]
B1["Pattern 1: Prompt Paste"]
B2["Pattern 2: AGENTS.md / System Prompt"]
B3["Pattern 3: Codified Domain Types"]
B4["Pattern 4: MCP / RAG-based Reference"]
C["AI-Generated Prototype Implementation<br>(Production-grade Tech Stack)"]
A --> B1 --> C
A --> B2 --> C
A --> B3 --> C
A --> B4 --> C
Pattern 1: Prompt paste. The bare minimum. You paste the glossary and Given/When/Then into the chat box and say “implement following this.” Effective for short tasks, but as the context window grows AI starts forgetting the top of the glossary.
Pattern 2: AGENTS.md / system prompt. AI agent tools like Claude Code, Cursor, and Cline automatically load AGENTS.md (or CLAUDE.md, .cursorrules) from the repository root every turn. Put the domain glossary, Bounded Context map, and invariants here, and AI references them on every generation. This is the most practical pattern in the context of this article.
Pattern 3: Codified domain types. Define domain models as TypeScript types, Java records, or Python Pydantic models and commit them to the repo first. AI reads the code as it generates, so code that violates the types is physically rejected as “the build doesn’t pass.” The point is letting the compiler enforce things, not relying on human discipline.
Pattern 4: MCP / RAG-based reference. Make the domain glossary, ADRs (Architecture Decision Records), and historical business documents available via MCP servers or RAG indexes. AI can search “what is the exact definition of Reservation?” on demand. You don’t have to cram everything into the prompt, which is effective when the domain is large. The DICE (Domain-Integrated Context Engineering) idea that Russ Miles introduces1 is close to this—treating domain objects as first-class context units.
In practice, the combination of Pattern 2 + Pattern 3 is the core. AGENTS.md hands over the domain vocabulary and Bounded Context every turn, while committed domain types apply physical enforcement. Pattern 4 is added once the domain grows. Pattern 1 is for experimentation.
The Value of Building with Production Tech
Here the assumption of “build the prototype with production tech” starts paying off.
The domain model becomes an asset as-is. TypeScript domain types, Pydantic validation, Java records—these can be copied directly from prototype to production. Proprietary formats generated by dedicated prototyping tools require rewriting at handover. With production tech, even if the code is thrown away, the code that expresses the domain can reliably be kept.
The “throw away or keep” decision can stand on its own technically. As Martin Fowler’s Sacrificial Architecture argues6, it can make sense to write the early code on the assumption it will be thrown away and rewritten. But that only matters if you can distinguish “code meant to be thrown away” from “code worth keeping.” With production tech, engineers can make that call at the component level: “keep this, throw that away.” This is the same structure Fowler himself stresses when he says “even if you plan to throw it away, maintain internal quality and module boundaries.”
Engineers’ options expand. At handover, engineers can choose to “take only the domain documentation and rewrite from scratch,” “use the code as a base,” or “reuse only some modules.” With proprietary formats, that choice itself disappears.
One caveat. Being written in production tech does not mean production quality. The general quality issues with AI-generated code apply regardless of the tech stack and need to be addressed separately. “High handover-ability” and “the handed-over code surviving in production” are different stories. The latter is out of scope here and belongs to the general topic of reviewing and assuring AI-generated code.
Roles That Support the Workflow—Three People Plus a “Hidden Fourth”
Up to this point we have been discussing the method. But for the method to work, you need roles to operate it. This workflow has four conceptual roles. One person can hold multiple roles, but the skill requirements differ, so we treat them separately.
Role 1: Domain Definer
What they do:
- Through dialogue with business experts, articulate the UL, Bounded Contexts, invariants, and use cases
- Design the domain model (Aggregate / Entity / Value Object)
- Handle the strategic design work of DDD
Workshop techniques they can use:
As a means of producing a domain definition, the following workshop techniques are strong. They are not mutually exclusive—it’s normal to combine them:
- Event Storming (Alberto Brandolini): Using sticky notes on a whiteboard (or online sticky-note tools like Miro / Mural) to lay out domain events chronologically. 10 to 20 people, half a day to two days. Strong at visualizing the entire business flow and discovering Bounded Contexts.
- Domain Storytelling (Stefan Hofer / Henning Schwentner): Have an expert tell the business as a “story,” and visualize the actors, actions, and work objects. 3 to 5 people, one to a few hours. Strong at drawing out tacit knowledge.
- Event Modeling (Adam Dymitruk): An evolution of Event Storming that integrates UI, commands, events, and views into a unified design. One to two days for detailed design.
In practice, the common pattern is “discover the whole with Event Storming → deep-dive specific parts with Domain Storytelling → consolidate into the five elements of the domain definition.”
The biggest advantage of these techniques pairing well with this workflow is that the outputs map directly into AGENTS.md. Event Storming’s orange notes (domain events) map naturally to AGENTS.md’s “domain event list,” purple notes (policies) to “invariants,” and the drawn boundaries to the “Bounded Context map.” By the time the workshop ends, AGENTS.md is nearly complete—an efficient handover7.
Required skills and knowledge:
- Deep understanding of the business domain (or the ability to converse smoothly with business experts)
- DDD strategic design patterns (UL, BC, Context Map)
- Facilitation skills—able to run the workshops above
- Articulation skills—able to convert vague business knowledge into a structured model
Typical persona: Architect with a domain-expert lean, senior PM, engineer with a business-analyst background.
Role 2: Prototype Builder
What they do:
- Hand the domain definition to AI and have AI build the prototype on a production-grade stack
- Iterate while validating with customers and stakeholders
- Feed any business-rule discrepancies discovered during prototyping back to the definer
Required skills and knowledge:
- Business knowledge—can judge whether AI’s output is business-valid
- Minimum DDD vocabulary—understands what UL, BC, and invariants mean
- AI prompting ability—can give instructions targeted at the goal
- Output judgment—can look at both behavior and code and tell “this matches the spec” from “this doesn’t”
- Articulation and feedback loop—can return discoveries into the documentation
Typical persona: PM, business analyst, product owner, engineer with business knowledge.
Important prerequisite: This role cannot be done by a “complete novice,” but it does not require a full-stack engineer. Being able to read code is even better, but being able to write code is not required. Conversely, if someone “doesn’t really know the business and can’t judge the output” takes this role, fluent errors AI produces go undetected and “but look, it’s working” pushes the prototype toward production—this is the typical accident path of the workflow.
Role 3: Production-Hardening Engineer
What they do:
- Take over the prototype and the definition and rebuild it at production quality
- Handle the production-grade implementation, testing, security, operational design, and CI/CD
- Judge the “throw away or keep” decision for prototype code, and lift the kept parts to production quality
Required skills and knowledge:
- Full-stack production development skills (test strategy, security, performance, availability, operational design)
- Reverse analysis from domain model—can read prototype code against the definition and check coherence
- Throw-away/keep judgment axes—can decide based on internal quality, module boundaries, and test coverage
- DDD tactical design—implementation of Aggregate Root, Repository, handling of Domain Events
Typical persona: Senior engineer, tech lead, architect.
Role 4: AI Environment Engineer (the Hidden Fourth)
This is the biggest key to running this workflow stably. To structurally lower the skill bar of Role 2, engineers set up the “scaffolding” in advance.
What they do:
- Prepare in advance an environment where the prototype builder can “make prototypes safely just by talking to AI”
- Translate Role 1’s definitions into a form that AI reliably references every turn
- Assemble a set of domain types, templates, MCP, Skills, and guardrail CI
Required skills and knowledge:
- Operational knowledge of AI agent tools (Claude Code, Cursor, Cline, etc.)
- AGENTS.md / system prompt design
- Up-front design and implementation of domain types
- MCP server construction, RAG pipeline design
- Guardrail implementation in CI/CD
Typical persona: Senior engineer well-versed in AI tools, Developer Experience lead, platform engineer.
Whether or not you place this fourth person is what determines whether the skill bar of the prototype builder can be lowered to “PM with strong business knowledge” level, or whether “engineer-level judgment” remains required.
Lowering the Skill Bar for Role 2—Scaffolding to Build in Advance
So what concretely does Role 4 build? Here is the scaffolding that gets Role 2 to “even someone close to a layperson can drive it safely.”
AGENTS.md / CLAUDE.md / .cursorrules
Placed at the repository root, this is force-loaded every turn. Put the following into it:
- Domain glossary (UL) and prohibitions
- Bounded Context map
- List of invariants
- Coding conventions and file structure rules
- Behavioral guidelines for AI (“do not guess unknown business terms—ask”)
The prototype builder just talks to AI without thinking about AGENTS.md, and AI automatically responds with the domain definition in mind. Role 4 is responsible for the quality of AGENTS.md.
Commit Domain Type Skeletons First
Put the types and state transitions for Entity / Value Object / Aggregate Root into the repository in TypeScript / Python / Java up front. When the prototype builder says “build the reservation feature” to AI, AI reads the existing types and writes only code that complies with them. Type violations are rejected as type errors, so quality assurance does not depend on the prototype builder’s judgment.
Template Repository
Have a template with ACLs, test skeletons, CI/CD, authentication / authorization, log collection, and error handling already in place. The prototype builder just “forks the template and starts talking.” No need to have AI build environments from scratch or roll its own authentication and security—this hugely raises the safety floor.
Custom MCP Server
Role 4 stands up an MCP server that makes the domain glossary, ADRs, historical business documents, and related DB schemas referenceable. AI can search on demand: “what is the exact definition of Reservation?”, “are there ADRs about this business?” You don’t need to stuff everything into the prompt, which is especially effective in organizations with large domains.
Domain-Specific Skills
Use the skill features of Claude Code or Cursor (commandized procedure books) so that recurring patterns like “implementation in the reservation context” or “integration with the payment context” can be invoked. The prototype builder can launch via something like /create-reservation-feature, and AI implements following the predefined procedure and domain constraints.
Guardrail CI
Have CI automatically check the following:
- Prohibition violations on terms (was
Reservationrewritten toBooking?) - Bounded Context boundary crossings (did payment processing leak into the Customer Reservation Context?)
- Disabling of authentication / authorization middleware
- Domain type violations
If CI fails, the prototype builder is informed immediately via AI that the change is a problem. CI notices the problems the prototype builder cannot notice—this is the essence of advance setup.
The Effect of Scaffolding
When these are in place, the skill requirements for Role 2 shift like this:
| Skill required by Role 2 | Without scaffolding | With scaffolding |
|---|---|---|
| Understanding the domain definition | Required, deep level | Required, minimum is OK |
| Judging AI output quality | Required, engineer level | Required, business validity only |
| Judging security and performance | Required | Unnecessary (templates + CI guarantee it) |
| Adherence to coding conventions | Required | Unnecessary (AGENTS.md and types guarantee it) |
| Adherence to vocabulary and boundaries | Required | Unnecessary (CI detects violations) |
Without scaffolding, Role 2 is barely doable except by engineers. With scaffolding, even a PM or BA with strong business knowledge can run the workflow safely.
In other words, “non-engineers can build AI prototypes” implicitly assumes “after Role 4 engineers have set up the scaffolding.” If you talk about an “era where anyone can develop with AI” without making this assumption explicit, you end up mass-producing failure cases where novices are handed AI without scaffolding and fluent malfunctions get pushed to production.
Pitfalls Specific to This Method
“AI paraphrases terms,” “AI ignores invariants,” “AI crosses Bounded Context boundaries,” “security vulnerabilities slip in”—these are common concerns, but they are not specific to this method. They occur in every situation where AI writes code, and they are addressed by general practices like AGENTS.md + types + self-review loops or static analysis. We treat these as AI development issues in general and leave them outside this article.
Instead, we focus on four pitfalls specific to DDD-first × AI prototyping. They are specific because they all arise structurally from the premise of “defining the domain first.”
Pitfall 1: Definition Quality Propagates Directly to Results
AI is highly capable at implementing instructions accurately. Precisely for that reason, sloppy definitions get implemented accurately—sloppily.
For example, suppose the UL glossary specifies only three states for Reservation: Pending → Confirmed → Cancelled. But the real business has “Expired (tentative hold expired)” and “Refunded (refund processed).” AI implements only the three states accurately, without noticing the gap. The prototype runs. Customers say “it’s working.” The problem surfaces only after production operation begins.
This is actually less likely to happen in the “build a working prototype first and clean up the domain afterward” flow. When AI has room to infer the business, the inference itself can expose gaps. In a method that fixes the domain definition first, the definer’s blind spots translate directly into production risk.
The countermeasure is to secure time for cross-review by multiple domain experts before handing the definition to AI. Bake “definition draft → expert peer review → AI prototype → customer review” into the workflow. Do not skip the review step just because you are tempted to spin the AI prototype loop faster.
Pitfall 2: Discoveries During Prototyping Fail to Flow Back to the Definition
While AI is building the prototype, it is common to realize “wait, this business rule is different from the spec” or “we need a boundary here.” The question is whether that discovery flows back into the domain definition.
If it doesn’t flow back, the prototype code has the new knowledge while AGENTS.md and the domain model definition remain stale. In the next iteration, AI uses the old definition and reverts the very places you had just fixed. This is not “AI breaking instructions”—it is a phenomenon specific to this method caused by humans neglecting to update the definition.
The countermeasure is to wire the prototype-to-definition feedback as an explicit work item. Require commit messages and PRs of prototype changes to include “the corresponding definition update,” hold periodic “code vs. definition gap meetings,” and treat definition-side changes as Pull Requests too. Treat the domain definition not as a static spec but as a living asset that is continuously updated alongside the code.
Pitfall 3: Early Boundary Fixing Inhibits Domain Discovery
DDD’s strategic design is fundamentally about revisiting boundaries as you learn. The first Bounded Context map is a hypothesis, and both Evans and Vernon repeatedly emphasize redrawing boundaries as the project progresses.
But if you operate the “fix the domain definition first, then hand it to AI” flow too rigidly, the boundaries you drew first become hard to move. AI builds the file structure and code on the premise of those boundaries. When you later realize “actually the Customer Reservation Context and Payment Context should have been integrated into the same aggregate,” the boundaries already pervade the generated codebase, and the psychological cost of rewriting is high.
The countermeasure is to explicitly mark initial boundaries as “tentative.” Write in AGENTS.md that “the current Bounded Context map is vN (tentative),” and schedule “boundary review meetings” at the end of prototype milestones. When you change a boundary, put the work of reflecting it back into the definition before the code refactor. Bake “boundaries are not sacred” into the operation from day one.
Pitfall 4: Over-Definition Leads to Waterfall Creep
Once “define perfectly before handing to AI” becomes a habit, the time spent on the definition extends indefinitely. This is structurally the same trap BDD and formal specifications historically fell into—writing too much spec. Heading toward “before handing to AI, write 100 UL terms, 10 Bounded Contexts, and 50 use cases” eats weeks before you ever reach a prototype. The speed value of AI’s fast feedback evaporates.
This is the flip side of the Sacrificial Architecture argument6 “maintain internal quality even when you plan to throw it away.” The definition does not have to be perfect, but you are not writing it intending to throw it away either—the balance is delicate.
The guiding principle is to decide a “minimum bootable set.” For example:
- Ubiquitous Language: 5 to 10 most important terms, each with 1 to 2 lines of definition and prohibitions
- Domain Model: 1 to 2 Aggregate Roots and their direct Entities / Value Objects
- Bounded Context: 1 to 3, with their relations (tentative)
- Invariants: 3 to 5 “must-never-violate” items
- Use Cases: 1 to 3 core scenarios in Given/When/Then
Once you have this much, hand it to AI and start the prototype. The rest grows through the feedback loop of Pitfall 2. Cutting off “write everything before handing over” is the only way to preserve the speed advantage of AI prototyping.
Handover and the “Prototype Treated as Production” Problem (Briefly)
Since this article’s main axis is the structure of “put DDD first and let AI build” and “role design + scaffolding,” we touch on handover and the prototype-as-production problem only briefly.
Handover. When you’ve fixed domain types up front in production tech, Role 3 (the production-hardening engineer) can choose between (a) taking only the domain documentation and writing from scratch, (b) using the code as a base, or (c) picking and choosing modules. There are multiple judgment axes—test coverage, security scan pass-through, internal code quality—but the very existence of these choices is itself the benefit of DDD-first × production-tech prototype × Role 4 scaffolding. The details of the judgment axes belong to a separate discussion.
Prototypes treated as production. The phenomenon of customers or executives pressing “if it’s working, can’t we just use it?” can happen in this workflow too. But this is not specific to this workflow—it is a classic problem that always happens with mock-ups and beta versions. As Retool points out8, the same SQL injection has a small blast radius in a prototype, but becomes a data leak once it’s connected to production Postgres. How to operationally separate prototype from production deserves its own treatment and is out of scope here.
Summary
“Show something working fast” and “build a system that understands the business correctly” can coexist. Three conditions.
First is order—the method. Build the domain definition first with DDD, then hand it to AI to write a prototype in production tech. Reverse that order, and AI’s chosen vocabulary and business inferences become a fait accompli you can no longer roll back. What you hand to AI is the five elements: UL, domain model, Bounded Context, invariants, and use case specifications. The core of “how to hand over” is the combination of AGENTS.md (force-loaded every turn) and codified types (enforced by the compiler).
Second is roles—the people. The workflow has four roles. Domain definer (articulates the business), prototype builder (hands the definition to AI and builds the prototype), production-hardening engineer (rebuilds prototype and definition at production quality), and AI environment engineer—the “hidden fourth.” Whether this fourth person is in place is what determines whether the prototype builder’s skill bar can be lowered to a business person’s level.
Third is scaffolding—the prep work. What the fourth person assembles is a set of AGENTS.md, domain type skeletons, template repository, custom MCP, domain-specific Skills, and guardrail CI. Only when these are all in place does “even a PM with strong business knowledge can build AI prototypes safely” actually hold. “Non-engineers can develop with AI” implicitly assumes “after engineers have set up the scaffolding”—leaving this assumption vague and handing AI to novices is what mass-produces fluent malfunctions going to production.
The pitfalls specific to this method are four—the front-loaded responsibility for definition quality, the feedback loop for discoveries during prototyping, premature boundary fixing, and over-definition leading to waterfall creep. All of them arise structurally from the premise of “defining the domain first,” and they are addressed by taking the definer’s responsibility seriously, wiring definition updates as an ongoing process, explicitly marking boundaries as tentative, and starting from a minimum bootable set. The problems that occur generally when AI writes code (term paraphrasing, ignoring invariants, boundary crossing, security vulnerabilities) are general AI development topics and out of scope here.
The assets DDD has been polishing for 20 years—ubiquitous language, domain model, Bounded Context—become directly the best context to hand AI. DDD is not becoming obsolete because AI arrived; DDD’s strategic design comes alive on the ground precisely because AI arrived. Include role design and scaffolding, and DDD extends further into “organizational design for producing results in the AI era.” This is, plausibly, the practitioner-side response to the Eric Evans Explore DDD 2024 talk that Khalili summarizes3 and to the direction DDD Academy formalized as a workshop in 20264.
Also see the thought experiment companion: This article focused on the “design theory” of method, roles, and scaffolding. A companion article that drives this through a concrete fictional scenario—Thought Experiment with a Hotel Booking System: Implementing DDD-First AI Prototyping with Next.js 16 + OpenNext—is published alongside it. Through a single scenario it shows how to run the Event Storming session, what the generated AGENTS.md actually looks like, the TypeScript implementation of domain types, a Claude Code dialogue example, CI guardrail examples, and the handover decisions made by the production-hardening engineer. Reading the two together gives you both the design theory and the implementation side.
Related Articles
You may also be interested in the following related articles:
- Thought Experiment with a Hotel Booking System: Implementing DDD-First AI Prototyping with Next.js 16 + OpenNext - The thought experiment companion to this article. Published simultaneously; reading both is recommended.
- How to Combine VSA and DDD in AI Development - Same series. A strategic discussion of combining VSA and DDD in the AI-development context.
- Three-Layer Integration Design of VSA × DDD × Harness - Same series. Resolving the “separate” vs. “share” contradiction through Bounded Contexts.
- A Practical Guide to Vibe Coding for Junior Engineers - AI prototyping at the individual level. The individual counterpart to this organizational article.
- Building Simple Is the Fastest Way to Build - The mindset of fixing the domain first, and prevention strategies at the design level.
References
References are listed in numerical order corresponding to citation numbers in the body.
Domain Driven Agent Design - Russ Miles (2025). The parable of an AI agent that confused booking with reservation. Introduces Rod Johnson’s DICE (Domain-Integrated Context Engineering) framework. [Reliability: Medium] (practitioner blog). ↩︎ ↩︎2 ↩︎3
AI for Requirements Engineering: Industry Adoption and Practitioner Perspectives - arXiv preprint 2511.01324 (2025). An 84-day study of 55 practitioners. 58.2% use AI in RE; Human-AI Collaboration dominates at 49–60.5%. [Reliability: High] (preprint with explicit methodology and strong empirical grounding). ↩︎
Domain Driven Design in the AI Era: From Models to Meaning - Alireza Rahmani Khalili (2025). Redefines Ubiquitous Language as a semantic contract among humans, software, and AI, and Bounded Context as a cognitive firewall. Cites Eric Evans’s Explore DDD 2024 talk in the opening, so references to Evans in this article are via Khalili’s essay. [Reliability: Medium] (personal blog but with clear logical structure). ↩︎ ↩︎2 ↩︎3
Accelerate your Strategic Design with Large Language Models - DDD Academy / Thomas Coopman (Aardling) (2026). Official two-day workshop on LLM × Strategic Design. [Reliability: Medium-High] (official educational arm of the DDD community). ↩︎ ↩︎2
Spec-driven development: Unpacking one of 2025’s key new AI-assisted engineering practices - Liu Shangqi, Thoughtworks APAC (2025). Argues that spec-driven development inherits domain-oriented ubiquitous language from BDD/DDD. Explicitly endorses the Given/When/Then format. [Reliability: Medium-High] (official Thoughtworks blog). ↩︎ ↩︎2
Sacrificial Architecture - Martin Fowler (2014). The concept of “designing on the premise of throwing it away in a few years.” Even when throwing away is assumed, internal quality and module boundaries should be maintained. [Reliability: High] (industry authority). ↩︎ ↩︎2
Designing Scalable Multi-Agent AI Systems: Leveraging Domain-Driven Design and Event Storming - Kaustav Dey, Kunal Nandi, DZone. A seven-step workflow combining DDD and Event Storming for multi-agent AI system design, with a supply chain example. [Reliability: Medium] (industry media article). ↩︎
The Risks of Vibe Coding: Security Vulnerabilities and Enterprise Pitfalls - Retool. Argues that the blast radius of prototype vulnerabilities is small, but once connected to production Postgres they become data leaks—an explicit framing of the difference in impact. [Reliability: Medium] (vendor blog, but the points are valid). ↩︎