VibeTDD Experiment 4.3: From Test to Implementation - A Domain Level

After successfully using AI to break down specs into stories and tasks, I moved to testing implementation: Can AI implement working code from behavioral test scenarios while following established conventions?

Phase 4.3 tested whether AI could take the clean test requirements generated in 4.2 and produce domain layer implementation that follows the VibeTDD conventions I'd been developing. I focused specifically on the domain layer (center of hexagon) to test the core patterns before expanding to other layers.

The results were... educational. And frustrating. And ultimately revelatory about a fundamental limitation in current AI-assisted development.

The Setup: Perfect Test Scenarios to Implementation

What I Had Going In

From Phase 4.2, I had generated clean, behavioral test scenarios:

markdown

**Amount Business Validation Batch**:
- `should accept payout when amount is positive and within limits`
- `should reject payout when amount is not positive`
- `should reject payout when amount exceeds individual payout limit`
- `should reject payout when amount would cause user total to exceed limit`

**Currency Business Rules Batch**:
- `should accept payout when currency is supported`
- `should reject payout when currency is not supported`

**Payout Creation Batch**:
- `should create payout when it is valid`
- `should reject payout when it is not valid`

1
2
3
4
5
6
7
8
9
10
11
12
13

The Implementation Task

I used a structured prompt system to guide Claude through the domain layer implementation. The prompt followed a clear two-phase approach:

Phase 1: Write Tests (RED)

Create minimal objects and interfaces needed for compilation
Don't implement business logic - leave methods empty or with basic stubs
Follow VibeTDD batching - implement all tests in a batch together
Ensure code compiles with mvn clean test
Verify all new tests fail - if any test passes, stop and report the issue

Phase 2: Implement Logic (GREEN)

Implement minimal logic to make all tests in the current batch pass
Keep implementation simple - focus only on making tests pass
Don't over-engineer or add features not required by tests
Verify all tests pass with mvn clean test

The prompt included comprehensive convention guidance:

Use VibeTDD batching principles
Follow Object Mother pattern (valid objects by default, override when needed)
Use MockK with annotations
Follow hexagonal architecture patterns
Create separate validator classes per business rule

Note: My conventions were still evolving and had gaps, so I expected some unexpected behaviors during this experimental phase.

The VibeTDD Process Validation

What Actually Worked

Despite the convention compliance issues, the core VibeTDD process did function as designed:

Phase 1 (RED) Success

AI correctly created minimal compilation stubs (empty interfaces, basic data classes)
Tests were written in batches following behavioral requirements
mvn clean test compiled successfully with all new tests failing
No business logic was implemented in the RED phase

Phase 2 (GREEN) Success

AI implemented logic specifically to make test batches pass
Tests progressed from failing to passing systematically
Implementation was focused and didn't include unnecessary features

Key Insight: The fundamental VibeTDD batching approach (RED batch → GREEN batch) proved effective for managing AI context and maintaining focus on specific behavioral requirements.

The State Detection Innovation

My prompt included automatic state detection to handle partial completion scenarios:

Before starting each task, check the current state:
- Run `mvn clean test` to see current test status
- If tests for current batch already exist and pass: mark as completed, move to next
- If tests exist but fail: skip to Phase 2 (implementation only)  
- If no tests exist: start with Phase 1 (write tests)

1
2
3
4
5

This allowed AI to resume work intelligently and handle interruptions gracefully.

What Went Wrong: Convention Violations Everywhere

Problem 1: Object Mother Hardcoded Values

Convention: Object Mother should create valid objects by default using Rand for realistic data.

What AI Implemented:

kotlin

object PayoutMother {
    fun of(
        userId: String = Rand.validUserId(),  // ✅ Used Rand correctly
        amount: Double = Rand.amount(),       // ✅ Used Rand correctly  
        currency: String = "USD"              // ❌ Hardcoded currency
    ) = Payout(userId, amount, currency)
}

1
2
3
4
5
6
7

AI correctly used Rand for userId and amount but hardcoded the currency to "USD". Later, when I introduced template examples, it corrected this pattern.

Problem 2: Inconsistent Implementation Pattern

What Happened: AI actually DID follow VibeTDD batching correctly for the most part - it created batches of tests first with minimal compilation stubs, verified they failed, then implemented the logic to make them pass. The two-phase approach (RED-GREEN) worked as intended.

However, without strict checkpoint-based prompts, the implementation didn't consistently follow other conventions throughout the process.

The Discovery: The core VibeTDD batching principle worked well, but maintaining consistency across multiple batches and ensuring comprehensive convention compliance required more structured guidance.

Problem 3: Inconsistent Architecture Patterns

Convention: Create separate validator classes following Single Responsibility Principle.

What AI Produced:

First validator: Created separate interface and implementation files
Second validator: Added implementation directly to existing file

Also some other Inconsistent behaviours:

Test setup: Mixed manual initialization with @BeforeEach in different tests. Added a test class setup to a method body
ValidationErrorMother: Created ValidationErrorMother for the first test batch, then completely ignored it in subsequent tests, creating validation errors manually in test bodies like ValidationError(UNSUPPORTED_CURRENCY, "Currency not supported")

The Pattern: AI doesn't maintain consistency across similar components within the same session.

The Deeper Problem: Convention Maintenance Hell

Managing Examples in Markdown is Not Going to Work

While analyzing these failures, I realized that maintaining code examples within convention markdown files is fundamentally problematic:

Duplication Everywhere

Data class definitions repeated across multiple convention files
Validation examples scattered across multiple files
MockK usage patterns repeated with slight variations

Inconsistency Accumulation

Examples in different files can follow different styles that could potentially confuse AI
Updates to one file didn't propagate to related examples
No way to verify that examples actually compile and work

Maintenance Nightmare

Fixing a pattern required updating multiple markdown files
Code examples in markdown can't be tested for correctness
No IDE support for refactoring examples across files

The "Single Source of Truth" Insight

During our discussion of these maintenance problems, I proposed trying a template approach where working code serves as the authoritative examples. AI reinforced this idea by pointing out:

"Single source of truth: Working code can't lie"

My Realization: Instead of maintaining examples in multiple markdown files, the template project itself should BE the example. Working code that compiles, runs, and follows all patterns correctly.

The Compliance Discovery: Strict Prompts Still Fail

Enhanced Prompt Strategy

I created a comprehensive prompt system with explicit checkpoints and state detection:

Key Features of the Prompt:

Task State Detection: Check current test status before starting each task
Convention File Reading: Explicit instructions to read all relevant convention files with checkpoints
Layer-Specific Guidance: Different instructions for domain, storage, and API layers
VibeTDD Batching: Clear two-phase approach (tests first, then implementation)
Checkpoint System: Multiple validation points with "stop here and report" instructions
Execution Control: Automated progression through work queue with state management

Example Checkpoint:

Checkpoints:
- If at least one file is missing, stop here and report the issue
- Ask yourself if you understood all conventions. If you have doubts, stop here and ask questions

1
2
3

Result: Even with this detailed, checkpoint-driven prompt, AI still produced convention violations during implementation, though it did follow the VibeTDD batching approach correctly. I did a test and removed one convention file, but it still says: "Perfect! Now I have everything..."

The BLUE Phase Discovery

After completing the RED-GREEN cycles, I noticed various convention violations in the working code. This led me to test something interesting - what if AI could handle the BLUE (refactor) phase of TDD specifically focused on convention compliance?

I tried a manual experiment. After AI produced working but non-compliant code (e.g., test class initialization in method bodies instead of @BeforeEach), I asked Claude:

"Now check testing conventions again and check what's wrong with the test CreatePayoutUseCaseTest"

What Happened: Claude immediately identified and fixed multiple issues:

Moved initialization from method bodies to @BeforeEach
Fixed Object Mother pattern violations
Corrected MockK annotation usage

The Insight: AI CAN follow conventions when focused specifically on compliance checking, but struggles to maintain convention compliance during the implementation phase.

The Three-Phase TDD Evolution

Why Convention Compliance Fails During Implementation

During the GREEN phase (making tests pass), AI is juggling multiple concerns:

Understanding business requirements
Generating working code
Managing dependencies
Following language idioms
Adhering to conventions

The Problem: Convention compliance gets deprioritized when AI is focused on making tests pass. The cognitive load is too high for simultaneous optimization.

The VibeTDD Three-Phase Approach

This experiment suggested an evolution of the traditional TDD cycle for AI collaboration:

RED Phase: Write Failing Tests

Focus on behavioral requirements and test design

GREEN Phase: Make Tests Pass

Focus solely on implementing working code that fulfills requirements

BLUE Phase: Convention Compliance

Focus specifically on refactoring to match established conventions

Note: I plan to test this three-phase approach more systematically once I complete implementing all layers (storage, API) to validate whether it scales across the entire hexagonal architecture.

The Architecture Implications

Convention System Evolution

This experiment revealed that my convention system needs fundamental restructuring:

From: Multiple Markdown Files

Examples scattered across convention documents
Duplication and inconsistency
Maintenance burden

To: Template as Single Source of Truth

Working code examples in template project
One implementation of each pattern
Verifiable, compilable, testable examples

The New Workflow

Template Development: Experts implement patterns correctly in template
Convention Extraction: Generate text conventions from working examples
Implementation: AI generates working code
Compliance Review: AI refactors against template patterns

Lessons for VibeTDD Framework

1. Embrace the Three-Phase TDD Cycle

AI-assisted TDD needs to evolve beyond RED-GREEN to include a focused BLUE phase for convention compliance.

2. Template as Source of Truth

Stop maintaining examples in markdown. The template project should demonstrate every pattern correctly, and AI should reference working code.

3. BLUE Phase Convention Compliance

The three-phase approach could potentially be systematized:

AI implements functionality (GREEN phase)
AI automatically reviews against template patterns (BLUE phase)
AI refactors to match conventions (BLUE phase completion)

The Meta-Learning

The most important insight: AI has cognitive limitations that must be acknowledged in development workflows. Instead of fighting these limitations, effective AI-assisted development designs workflows that work with AI's constraint prioritization behavior.

VibeTDD isn't just about combining AI with TDD - it's about creating development processes that account for how AI actually works, not how we wish it worked.

Next: I will continue with the rest of the story: testing storage/API layers and integration testing. I'll also return to systematically testing the three-phase approach (RED-GREEN-BLUE) to see if the BLUE phase convention compliance can be automated across all hexagonal architecture layers.

VibeTDD Experiment 4.3: From Test to Implementation - A Domain Level #

The Setup: Perfect Test Scenarios to Implementation #

What I Had Going In #

The Implementation Task #

Phase 1: Write Tests (RED) #

Phase 2: Implement Logic (GREEN) #

The VibeTDD Process Validation #

What Actually Worked #

Phase 1 (RED) Success #

Phase 2 (GREEN) Success #

The State Detection Innovation #

What Went Wrong: Convention Violations Everywhere #

Problem 1: Object Mother Hardcoded Values #

Problem 2: Inconsistent Implementation Pattern #

Problem 3: Inconsistent Architecture Patterns #

The Deeper Problem: Convention Maintenance Hell #

Managing Examples in Markdown is Not Going to Work #

Duplication Everywhere #

Inconsistency Accumulation #

Maintenance Nightmare #

The "Single Source of Truth" Insight #

The Compliance Discovery: Strict Prompts Still Fail #

Enhanced Prompt Strategy #

The BLUE Phase Discovery #

The Three-Phase TDD Evolution #

Why Convention Compliance Fails During Implementation #

The VibeTDD Three-Phase Approach #

RED Phase: Write Failing Tests #

GREEN Phase: Make Tests Pass #

BLUE Phase: Convention Compliance #

The Architecture Implications #

Convention System Evolution #

From: Multiple Markdown Files #

To: Template as Single Source of Truth #

The New Workflow #

Lessons for VibeTDD Framework #

1. Embrace the Three-Phase TDD Cycle #

2. Template as Source of Truth #

3. BLUE Phase Convention Compliance #

The Meta-Learning #