Skip to content
On this page

VibeTDD Experiment 4.3: From Test to Implementation - A Domain Level

After successfully using AI to break down specs into stories and tasks, I moved to testing implementation: Can AI implement working code from behavioral test scenarios while following established conventions?

Phase 4.3 tested whether AI could take the clean test requirements generated in 4.2 and produce domain layer implementation that follows the VibeTDD conventions I'd been developing. I focused specifically on the domain layer (center of hexagon) to test the core patterns before expanding to other layers.

The results were... educational. And frustrating. And ultimately revelatory about a fundamental limitation in current AI-assisted development.

The Setup: Perfect Test Scenarios to Implementation

What I Had Going In

From Phase 4.2, I had generated clean, behavioral test scenarios:

markdown
**Amount Business Validation Batch**:
- `should accept payout when amount is positive and within limits`
- `should reject payout when amount is not positive`
- `should reject payout when amount exceeds individual payout limit`
- `should reject payout when amount would cause user total to exceed limit`

**Currency Business Rules Batch**:
- `should accept payout when currency is supported`
- `should reject payout when currency is not supported`

**Payout Creation Batch**:
- `should create payout when it is valid`
- `should reject payout when it is not valid`
1
2
3
4
5
6
7
8
9
10
11
12
13

The Implementation Task

I used a structured prompt system to guide Claude through the domain layer implementation. The prompt followed a clear two-phase approach:

Phase 1: Write Tests (RED)

  1. Create minimal objects and interfaces needed for compilation
  2. Don't implement business logic - leave methods empty or with basic stubs
  3. Follow VibeTDD batching - implement all tests in a batch together
  4. Ensure code compiles with mvn clean test
  5. Verify all new tests fail - if any test passes, stop and report the issue

Phase 2: Implement Logic (GREEN)

  1. Implement minimal logic to make all tests in the current batch pass
  2. Keep implementation simple - focus only on making tests pass
  3. Don't over-engineer or add features not required by tests
  4. Verify all tests pass with mvn clean test

The prompt included comprehensive convention guidance:

  • Use VibeTDD batching principles
  • Follow Object Mother pattern (valid objects by default, override when needed)
  • Use MockK with annotations
  • Follow hexagonal architecture patterns
  • Create separate validator classes per business rule

Note: My conventions were still evolving and had gaps, so I expected some unexpected behaviors during this experimental phase.

The VibeTDD Process Validation

What Actually Worked

Despite the convention compliance issues, the core VibeTDD process did function as designed:

Phase 1 (RED) Success

  • AI correctly created minimal compilation stubs (empty interfaces, basic data classes)
  • Tests were written in batches following behavioral requirements
  • mvn clean test compiled successfully with all new tests failing
  • No business logic was implemented in the RED phase

Phase 2 (GREEN) Success

  • AI implemented logic specifically to make test batches pass
  • Tests progressed from failing to passing systematically
  • Implementation was focused and didn't include unnecessary features

Key Insight: The fundamental VibeTDD batching approach (RED batch → GREEN batch) proved effective for managing AI context and maintaining focus on specific behavioral requirements.

The State Detection Innovation

My prompt included automatic state detection to handle partial completion scenarios:

Before starting each task, check the current state:
- Run `mvn clean test` to see current test status
- If tests for current batch already exist and pass: mark as completed, move to next
- If tests exist but fail: skip to Phase 2 (implementation only)  
- If no tests exist: start with Phase 1 (write tests)
1
2
3
4
5

This allowed AI to resume work intelligently and handle interruptions gracefully.

What Went Wrong: Convention Violations Everywhere

Problem 1: Object Mother Hardcoded Values

Convention: Object Mother should create valid objects by default using Rand for realistic data.

What AI Implemented:

kotlin
object PayoutMother {
    fun of(
        userId: String = Rand.validUserId(),  // ✅ Used Rand correctly
        amount: Double = Rand.amount(),       // ✅ Used Rand correctly  
        currency: String = "USD"              // ❌ Hardcoded currency
    ) = Payout(userId, amount, currency)
}
1
2
3
4
5
6
7

AI correctly used Rand for userId and amount but hardcoded the currency to "USD". Later, when I introduced template examples, it corrected this pattern.

Problem 2: Inconsistent Implementation Pattern

What Happened: AI actually DID follow VibeTDD batching correctly for the most part - it created batches of tests first with minimal compilation stubs, verified they failed, then implemented the logic to make them pass. The two-phase approach (RED-GREEN) worked as intended.

However, without strict checkpoint-based prompts, the implementation didn't consistently follow other conventions throughout the process.

The Discovery: The core VibeTDD batching principle worked well, but maintaining consistency across multiple batches and ensuring comprehensive convention compliance required more structured guidance.

Problem 3: Inconsistent Architecture Patterns

Convention: Create separate validator classes following Single Responsibility Principle.

What AI Produced:

  • First validator: Created separate interface and implementation files
  • Second validator: Added implementation directly to existing file

Also some other Inconsistent behaviours:

  • Test setup: Mixed manual initialization with @BeforeEach in different tests. Added a test class setup to a method body
  • ValidationErrorMother: Created ValidationErrorMother for the first test batch, then completely ignored it in subsequent tests, creating validation errors manually in test bodies like ValidationError(UNSUPPORTED_CURRENCY, "Currency not supported")

The Pattern: AI doesn't maintain consistency across similar components within the same session.

The Deeper Problem: Convention Maintenance Hell

Managing Examples in Markdown is Not Going to Work

While analyzing these failures, I realized that maintaining code examples within convention markdown files is fundamentally problematic:

Duplication Everywhere

  • Data class definitions repeated across multiple convention files
  • Validation examples scattered across multiple files
  • MockK usage patterns repeated with slight variations

Inconsistency Accumulation

  • Examples in different files can follow different styles that could potentially confuse AI
  • Updates to one file didn't propagate to related examples
  • No way to verify that examples actually compile and work

Maintenance Nightmare

  • Fixing a pattern required updating multiple markdown files
  • Code examples in markdown can't be tested for correctness
  • No IDE support for refactoring examples across files

The "Single Source of Truth" Insight

During our discussion of these maintenance problems, I proposed trying a template approach where working code serves as the authoritative examples. AI reinforced this idea by pointing out:

"Single source of truth: Working code can't lie"

My Realization: Instead of maintaining examples in multiple markdown files, the template project itself should BE the example. Working code that compiles, runs, and follows all patterns correctly.

The Compliance Discovery: Strict Prompts Still Fail

Enhanced Prompt Strategy

I created a comprehensive prompt system with explicit checkpoints and state detection:

Key Features of the Prompt:

  • Task State Detection: Check current test status before starting each task
  • Convention File Reading: Explicit instructions to read all relevant convention files with checkpoints
  • Layer-Specific Guidance: Different instructions for domain, storage, and API layers
  • VibeTDD Batching: Clear two-phase approach (tests first, then implementation)
  • Checkpoint System: Multiple validation points with "stop here and report" instructions
  • Execution Control: Automated progression through work queue with state management

Example Checkpoint:

Checkpoints:
- If at least one file is missing, stop here and report the issue
- Ask yourself if you understood all conventions. If you have doubts, stop here and ask questions
1
2
3

Result: Even with this detailed, checkpoint-driven prompt, AI still produced convention violations during implementation, though it did follow the VibeTDD batching approach correctly. I did a test and removed one convention file, but it still says: "Perfect! Now I have everything..."

The BLUE Phase Discovery

After completing the RED-GREEN cycles, I noticed various convention violations in the working code. This led me to test something interesting - what if AI could handle the BLUE (refactor) phase of TDD specifically focused on convention compliance?

I tried a manual experiment. After AI produced working but non-compliant code (e.g., test class initialization in method bodies instead of @BeforeEach), I asked Claude:

"Now check testing conventions again and check what's wrong with the test CreatePayoutUseCaseTest"

What Happened: Claude immediately identified and fixed multiple issues:

  • Moved initialization from method bodies to @BeforeEach
  • Fixed Object Mother pattern violations
  • Corrected MockK annotation usage

The Insight: AI CAN follow conventions when focused specifically on compliance checking, but struggles to maintain convention compliance during the implementation phase.

The Three-Phase TDD Evolution

Why Convention Compliance Fails During Implementation

During the GREEN phase (making tests pass), AI is juggling multiple concerns:

  • Understanding business requirements
  • Generating working code
  • Managing dependencies
  • Following language idioms
  • Adhering to conventions

The Problem: Convention compliance gets deprioritized when AI is focused on making tests pass. The cognitive load is too high for simultaneous optimization.

The VibeTDD Three-Phase Approach

This experiment suggested an evolution of the traditional TDD cycle for AI collaboration:

RED Phase: Write Failing Tests

Focus on behavioral requirements and test design

GREEN Phase: Make Tests Pass

Focus solely on implementing working code that fulfills requirements

BLUE Phase: Convention Compliance

Focus specifically on refactoring to match established conventions

Note: I plan to test this three-phase approach more systematically once I complete implementing all layers (storage, API) to validate whether it scales across the entire hexagonal architecture.

The Architecture Implications

Convention System Evolution

This experiment revealed that my convention system needs fundamental restructuring:

From: Multiple Markdown Files

  • Examples scattered across convention documents
  • Duplication and inconsistency
  • Maintenance burden

To: Template as Single Source of Truth

  • Working code examples in template project
  • One implementation of each pattern
  • Verifiable, compilable, testable examples

The New Workflow

  1. Template Development: Experts implement patterns correctly in template
  2. Convention Extraction: Generate text conventions from working examples
  3. Implementation: AI generates working code
  4. Compliance Review: AI refactors against template patterns

Lessons for VibeTDD Framework

1. Embrace the Three-Phase TDD Cycle

AI-assisted TDD needs to evolve beyond RED-GREEN to include a focused BLUE phase for convention compliance.

2. Template as Source of Truth

Stop maintaining examples in markdown. The template project should demonstrate every pattern correctly, and AI should reference working code.

3. BLUE Phase Convention Compliance

The three-phase approach could potentially be systematized:

  1. AI implements functionality (GREEN phase)
  2. AI automatically reviews against template patterns (BLUE phase)
  3. AI refactors to match conventions (BLUE phase completion)

The Meta-Learning

The most important insight: AI has cognitive limitations that must be acknowledged in development workflows. Instead of fighting these limitations, effective AI-assisted development designs workflows that work with AI's constraint prioritization behavior.

VibeTDD isn't just about combining AI with TDD - it's about creating development processes that account for how AI actually works, not how we wish it worked.


Next: I will continue with the rest of the story: testing storage/API layers and integration testing. I'll also return to systematically testing the three-phase approach (RED-GREEN-BLUE) to see if the BLUE phase convention compliance can be automated across all hexagonal architecture layers.

Built by software engineer for engineers )))