VibeTDD Experiment 4.3: From Test to Implementation - A Domain Level
After successfully using AI to break down specs into stories and tasks, I moved to testing implementation: Can AI implement working code from behavioral test scenarios while following established conventions?
Phase 4.3 tested whether AI could take the clean test requirements generated in 4.2 and produce domain layer implementation that follows the VibeTDD conventions I'd been developing. I focused specifically on the domain layer (center of hexagon) to test the core patterns before expanding to other layers.
The results were... educational. And frustrating. And ultimately revelatory about a fundamental limitation in current AI-assisted development.
The Setup: Perfect Test Scenarios to Implementation
What I Had Going In
From Phase 4.2, I had generated clean, behavioral test scenarios:
**Amount Business Validation Batch**:
- `should accept payout when amount is positive and within limits`
- `should reject payout when amount is not positive`
- `should reject payout when amount exceeds individual payout limit`
- `should reject payout when amount would cause user total to exceed limit`
**Currency Business Rules Batch**:
- `should accept payout when currency is supported`
- `should reject payout when currency is not supported`
**Payout Creation Batch**:
- `should create payout when it is valid`
- `should reject payout when it is not valid`
2
3
4
5
6
7
8
9
10
11
12
13
The Implementation Task
I used a structured prompt system to guide Claude through the domain layer implementation. The prompt followed a clear two-phase approach:
Phase 1: Write Tests (RED)
- Create minimal objects and interfaces needed for compilation
- Don't implement business logic - leave methods empty or with basic stubs
- Follow VibeTDD batching - implement all tests in a batch together
- Ensure code compiles with
mvn clean test
- Verify all new tests fail - if any test passes, stop and report the issue
Phase 2: Implement Logic (GREEN)
- Implement minimal logic to make all tests in the current batch pass
- Keep implementation simple - focus only on making tests pass
- Don't over-engineer or add features not required by tests
- Verify all tests pass with
mvn clean test
The prompt included comprehensive convention guidance:
- Use VibeTDD batching principles
- Follow Object Mother pattern (valid objects by default, override when needed)
- Use MockK with annotations
- Follow hexagonal architecture patterns
- Create separate validator classes per business rule
Note: My conventions were still evolving and had gaps, so I expected some unexpected behaviors during this experimental phase.
The VibeTDD Process Validation
What Actually Worked
Despite the convention compliance issues, the core VibeTDD process did function as designed:
Phase 1 (RED) Success
- AI correctly created minimal compilation stubs (empty interfaces, basic data classes)
- Tests were written in batches following behavioral requirements
mvn clean test
compiled successfully with all new tests failing- No business logic was implemented in the RED phase
Phase 2 (GREEN) Success
- AI implemented logic specifically to make test batches pass
- Tests progressed from failing to passing systematically
- Implementation was focused and didn't include unnecessary features
Key Insight: The fundamental VibeTDD batching approach (RED batch → GREEN batch) proved effective for managing AI context and maintaining focus on specific behavioral requirements.
The State Detection Innovation
My prompt included automatic state detection to handle partial completion scenarios:
Before starting each task, check the current state:
- Run `mvn clean test` to see current test status
- If tests for current batch already exist and pass: mark as completed, move to next
- If tests exist but fail: skip to Phase 2 (implementation only)
- If no tests exist: start with Phase 1 (write tests)
2
3
4
5
This allowed AI to resume work intelligently and handle interruptions gracefully.
What Went Wrong: Convention Violations Everywhere
Problem 1: Object Mother Hardcoded Values
Convention: Object Mother should create valid objects by default using Rand for realistic data.
What AI Implemented:
object PayoutMother {
fun of(
userId: String = Rand.validUserId(), // ✅ Used Rand correctly
amount: Double = Rand.amount(), // ✅ Used Rand correctly
currency: String = "USD" // ❌ Hardcoded currency
) = Payout(userId, amount, currency)
}
2
3
4
5
6
7
AI correctly used Rand for userId and amount but hardcoded the currency to "USD". Later, when I introduced template examples, it corrected this pattern.
Problem 2: Inconsistent Implementation Pattern
What Happened: AI actually DID follow VibeTDD batching correctly for the most part - it created batches of tests first with minimal compilation stubs, verified they failed, then implemented the logic to make them pass. The two-phase approach (RED-GREEN) worked as intended.
However, without strict checkpoint-based prompts, the implementation didn't consistently follow other conventions throughout the process.
The Discovery: The core VibeTDD batching principle worked well, but maintaining consistency across multiple batches and ensuring comprehensive convention compliance required more structured guidance.
Problem 3: Inconsistent Architecture Patterns
Convention: Create separate validator classes following Single Responsibility Principle.
What AI Produced:
- First validator: Created separate interface and implementation files
- Second validator: Added implementation directly to existing file
Also some other Inconsistent behaviours:
- Test setup: Mixed manual initialization with @BeforeEach in different tests. Added a test class setup to a method body
- ValidationErrorMother: Created
ValidationErrorMother
for the first test batch, then completely ignored it in subsequent tests, creating validation errors manually in test bodies likeValidationError(UNSUPPORTED_CURRENCY, "Currency not supported")
The Pattern: AI doesn't maintain consistency across similar components within the same session.
The Deeper Problem: Convention Maintenance Hell
Managing Examples in Markdown is Not Going to Work
While analyzing these failures, I realized that maintaining code examples within convention markdown files is fundamentally problematic:
Duplication Everywhere
- Data class definitions repeated across multiple convention files
- Validation examples scattered across multiple files
- MockK usage patterns repeated with slight variations
Inconsistency Accumulation
- Examples in different files can follow different styles that could potentially confuse AI
- Updates to one file didn't propagate to related examples
- No way to verify that examples actually compile and work
Maintenance Nightmare
- Fixing a pattern required updating multiple markdown files
- Code examples in markdown can't be tested for correctness
- No IDE support for refactoring examples across files
The "Single Source of Truth" Insight
During our discussion of these maintenance problems, I proposed trying a template approach where working code serves as the authoritative examples. AI reinforced this idea by pointing out:
"Single source of truth: Working code can't lie"
My Realization: Instead of maintaining examples in multiple markdown files, the template project itself should BE the example. Working code that compiles, runs, and follows all patterns correctly.
The Compliance Discovery: Strict Prompts Still Fail
Enhanced Prompt Strategy
I created a comprehensive prompt system with explicit checkpoints and state detection:
Key Features of the Prompt:
- Task State Detection: Check current test status before starting each task
- Convention File Reading: Explicit instructions to read all relevant convention files with checkpoints
- Layer-Specific Guidance: Different instructions for domain, storage, and API layers
- VibeTDD Batching: Clear two-phase approach (tests first, then implementation)
- Checkpoint System: Multiple validation points with "stop here and report" instructions
- Execution Control: Automated progression through work queue with state management
Example Checkpoint:
Checkpoints:
- If at least one file is missing, stop here and report the issue
- Ask yourself if you understood all conventions. If you have doubts, stop here and ask questions
2
3
Result: Even with this detailed, checkpoint-driven prompt, AI still produced convention violations during implementation, though it did follow the VibeTDD batching approach correctly. I did a test and removed one convention file, but it still says: "Perfect! Now I have everything..."
The BLUE Phase Discovery
After completing the RED-GREEN cycles, I noticed various convention violations in the working code. This led me to test something interesting - what if AI could handle the BLUE (refactor) phase of TDD specifically focused on convention compliance?
I tried a manual experiment. After AI produced working but non-compliant code (e.g., test class initialization in method bodies instead of @BeforeEach), I asked Claude:
"Now check testing conventions again and check what's wrong with the test CreatePayoutUseCaseTest"
What Happened: Claude immediately identified and fixed multiple issues:
- Moved initialization from method bodies to @BeforeEach
- Fixed Object Mother pattern violations
- Corrected MockK annotation usage
The Insight: AI CAN follow conventions when focused specifically on compliance checking, but struggles to maintain convention compliance during the implementation phase.
The Three-Phase TDD Evolution
Why Convention Compliance Fails During Implementation
During the GREEN phase (making tests pass), AI is juggling multiple concerns:
- Understanding business requirements
- Generating working code
- Managing dependencies
- Following language idioms
- Adhering to conventions
The Problem: Convention compliance gets deprioritized when AI is focused on making tests pass. The cognitive load is too high for simultaneous optimization.
The VibeTDD Three-Phase Approach
This experiment suggested an evolution of the traditional TDD cycle for AI collaboration:
RED Phase: Write Failing Tests
Focus on behavioral requirements and test design
GREEN Phase: Make Tests Pass
Focus solely on implementing working code that fulfills requirements
BLUE Phase: Convention Compliance
Focus specifically on refactoring to match established conventions
Note: I plan to test this three-phase approach more systematically once I complete implementing all layers (storage, API) to validate whether it scales across the entire hexagonal architecture.
The Architecture Implications
Convention System Evolution
This experiment revealed that my convention system needs fundamental restructuring:
From: Multiple Markdown Files
- Examples scattered across convention documents
- Duplication and inconsistency
- Maintenance burden
To: Template as Single Source of Truth
- Working code examples in template project
- One implementation of each pattern
- Verifiable, compilable, testable examples
The New Workflow
- Template Development: Experts implement patterns correctly in template
- Convention Extraction: Generate text conventions from working examples
- Implementation: AI generates working code
- Compliance Review: AI refactors against template patterns
Lessons for VibeTDD Framework
1. Embrace the Three-Phase TDD Cycle
AI-assisted TDD needs to evolve beyond RED-GREEN to include a focused BLUE phase for convention compliance.
2. Template as Source of Truth
Stop maintaining examples in markdown. The template project should demonstrate every pattern correctly, and AI should reference working code.
3. BLUE Phase Convention Compliance
The three-phase approach could potentially be systematized:
- AI implements functionality (GREEN phase)
- AI automatically reviews against template patterns (BLUE phase)
- AI refactors to match conventions (BLUE phase completion)
The Meta-Learning
The most important insight: AI has cognitive limitations that must be acknowledged in development workflows. Instead of fighting these limitations, effective AI-assisted development designs workflows that work with AI's constraint prioritization behavior.
VibeTDD isn't just about combining AI with TDD - it's about creating development processes that account for how AI actually works, not how we wish it worked.
Next: I will continue with the rest of the story: testing storage/API layers and integration testing. I'll also return to systematically testing the three-phase approach (RED-GREEN-BLUE) to see if the BLUE phase convention compliance can be automated across all hexagonal architecture layers.