VibeTDD Experiment 2: When AI Leads a Real TDD Challenge

This is Phase 2 of my VibeTDD series. After the calculator experiment showed promise, it was time for a real test.

The Challenge: From Toy to Reality

After Claude successfully guided me through TDD basics with a calculator, I decided to escalate dramatically. No more toy problems - time for a real coding challenge from Portfo.

The Task: Build a payout service with these requirements:

Validate payout data (UserId, Amount, Currency)
Amount must not exceed 30
Only EUR, USD, GBP currencies allowed
Sum of all user payouts must not exceed 100
Store valid payouts in memory
Handle validation errors gracefully

The Rules (almost the same as Phase 1):

Claude leads the entire TDD process
I only implement what it tells me to
No TDD guidance from me initially
When Claude asks what to do next, I'd say "decide yourself"

But this time, I was watching more carefully for anti-patterns.

What Went Wrong: The Over-Engineering Begins

Problem 1: Test Explosion

Claude started strong with the first test:

kotlin

@Test
fun `should store payout when all data is valid`() {
    val payout = Payout("user123", 25.0, "EUR")
    
    payoutService.processPayout(payout)
    
    // Should not throw exception
}

1
2
3
4
5
6
7
8

But then it suggested 15 additional test cases for the same basic functionality:

kotlin

@Test  
fun `should store payout with minimum amount`()

@Test
fun `should store payout with maximum amount`() 

@Test
fun `should store payout with EUR currency`()

@Test
fun `should store payout with USD currency`()

@Test
fun `should store payout with GBP currency`()

// ... and 10 more similar tests

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Red Flag #1: AI defaults to exhaustive testing rather than minimal, focused tests. It was testing every permutation instead of using triangulation.

Problem 2: The Object Mother Catastrophe

When Claude started creating Payout objects directly in test bodies, I stepped in:

"This will become unmaintainable. Let's use the Object Mother pattern."

Claude's response was... excessive:

kotlin

object PayoutMother {
    // "Valid" variations
    fun withSmallAmount() = Payout("user123", 1.0, "EUR")
    fun withMediumAmount() = Payout("user123", 15.0, "EUR") 
    fun withLargeAmount() = Payout("user123", 29.0, "EUR")
    fun withMaxAmount() = Payout("user123", 30.0, "EUR")
    
    // Currency variations
    fun withEUR() = Payout("user123", 25.0, "EUR")
    fun withUSD() = Payout("user123", 25.0, "USD")
    fun withGBP() = Payout("user123", 25.0, "GBP")
    
    // User variations
    fun withUser1() = Payout("user1", 25.0, "EUR")
    fun withUser2() = Payout("user2", 25.0, "EUR")
    
    // ... 15 more similar methods
}

// And then, the horror:
object InvalidPayoutMother {
    fun withNegativeAmount() = Payout("user123", -5.0, "EUR")
    fun withZeroAmount() = Payout("user123", 0.0, "EUR")
    fun withExcessiveAmount() = Payout("user123", 31.0, "EUR")
    fun withInvalidCurrency() = Payout("user123", 25.0, "JPY")
    fun withEmptyUserId() = Payout("", 25.0, "EUR")
    // ... more invalid variations
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

Red Flag #2: AI treated Object Mother as a factory for every possible test scenario rather than a single, flexible method with valid and random default values.

The correct approach would have been:

kotlin

object PayoutMother {

    fun of(
        userId: String = Rand.string(),
        amount: Double = Rand.amount(),
        currency: String = Rand.currency(),
    ) = Payout(
        userId = userId,
        amount = amount,
        currency = currency,
    )
}

1
2
3
4
5
6
7
8
9
10
11
12

Problem 3: Classic TDD is Impossible with AI

The traditional red-green-refactor cycle that worked beautifully for the calculator completely fell apart:

What Should Happen (Classic TDD):

Write one failing test
Make it pass with minimal code
Refactor
Repeat

What Actually Happened:

Claude writes 5-10 tests at once
Suggests implementing everything simultaneously
No triangulation or incremental development
Skips the "minimal code" phase entirely

But here's the deeper problem: Classic TDD isn't just ineffective with AI - it's impossible for practical reasons:

Context Explosion: Each red-green cycle adds more conversation history
Memory Consumption: AI sessions blow up quickly with back-and-forth iterations
Time Overhead: The constant switching between test/implementation becomes prohibitively slow
Session Limits: You'll hit token limits before completing any meaningful feature

My Solution - The VibeTDD Principle: Instead of writing tests one-by-one, write small sets of related tests first, then implement them together. This batching approach:

Reduces context switching overhead
Keeps AI sessions manageable
Maintains test-first discipline
Allows for better validation of test completeness

kotlin

// Instead of: Write one test → implement → write next test → implement
// Do this: Write a focused set of tests → verify they fail → implement together

@Test
fun `should throw exception when UserId is empty`() { /* ... */ }

@Test  
fun `should throw exception when UserId is null`() { /* ... */ }

@Test
fun `should not throw exception when UserId is valid`() { /* ... */ }

// Then implement UserIdValidator to make all three pass

1
2
3
4
5
6
7
8
9
10
11
12
13

Problem 4: Too Proactive (And Imprecise Instructions)

Claude kept coding without permission:

"Now let's implement the validation logic..." [proceeds to write 50 lines of code]

I learned to be extremely specific:

❌ "Continue with the next step"
❌ "Implement the validation"
❌ "Proceed"

✅ "Write only the test for empty UserId validation"
✅ "Implement only the UserId validation method"
✅ "Show me the next single test case"

Problem 5: Missing Engineering Fundamentals

Despite leading TDD, Claude missed basic software engineering principles:

No Separation of Concerns:

kotlin

class PayoutService {
    fun processPayout(payout: Payout) {
        // Validation logic mixed with business logic
        if (payout.userId.isEmpty()) throw Exception("...")
        if (payout.amount <= 0) throw Exception("...")
        if (payout.currency !in listOf("EUR", "USD", "GBP")) throw Exception("...")
        
        storage.store(payout) // Business logic
    }
}

1
2
3
4
5
6
7
8
9
10

No Mocking in Tests:

kotlin

@Test
fun `should validate payout data`() {
    val service = PayoutService(InMemoryStorage()) // Real dependency!
    // ...
}

1
2
3
4
5

Hardcoded Business Rules:

kotlin

if (payout.amount > 30.0) // Magic number!

Non-compilable Code:

kotlin

// Claude confidently presented this:
shouldThrow<ValidationException> { // Wrong import
    service.process(invalidPayout)
}

1
2
3
4

The Moment of Intervention

After watching Claude create an unmaintainable mess while being "sure everything is perfect," I had to step in:

"This violates the Single Responsibility Principle. Let's separate validation into individual validator classes."

Claude's response: "You're absolutely right! I violated the Single Responsibility Principle..."

It knew the principles but didn't apply them without explicit prompting.

What We Built (With Heavy Guidance)

After course-correcting, we ended up with a properly architected solution:

Domain Model

kotlin

data class Payout(
    val userId: String,
    val amount: Double,
    val currency: String
)

1
2
3
4
5

Validator Interface

kotlin

interface PayoutValidator {
    fun validate(payout: Payout)
}

1
2
3

Individual Validators

kotlin

class UserIdValidator : PayoutValidator {
    override fun validate(payout: Payout) {
        if (payout.userId.isEmpty()) {
            throw InvalidPayoutException(
                PayoutErrorCode.EMPTY_USER_ID,
                "UserId cannot be empty"
            )
        }
    }
}

class AmountValidator(
    private val configuration: PayoutConfiguration
) : PayoutValidator {
    override fun validate(payout: Payout) {
        if (payout.amount <= 0) {
            throw InvalidPayoutException(
                PayoutErrorCode.INVALID_AMOUNT,
                "Amount must be greater than zero"
            )
        }
        
        val maxAmount = configuration.getMaxAmount()
        if (payout.amount > maxAmount) {
            throw InvalidPayoutException(
                PayoutErrorCode.INVALID_AMOUNT,
                "Amount cannot exceed $maxAmount"
            )
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Service Orchestration

kotlin

class PayoutService(
    private val storage: PayoutStorage,
    private val validators: List<PayoutValidator>
) {
    fun process(payout: Payout) {
        validators.forEach { it.validate(payout) }
        storage.store(payout)
    }
}

1
2
3
4
5
6
7
8
9

Proper Error Handling

kotlin

enum class PayoutErrorCode {
    EMPTY_USER_ID,
    INVALID_AMOUNT,
    INVALID_CURRENCY,
    USER_LIMIT_EXCEEDED
}

class InvalidPayoutException(
    val code: PayoutErrorCode,
    message: String
) : Exception(message)

1
2
3
4
5
6
7
8
9
10
11

Clean Tests

kotlin

@ExtendWith(MockKExtension::class)
class AmountValidatorTest {
    
    @InjectMockKs
    private lateinit var validator: AmountValidator
    
    @MockK
    private lateinit var configuration: PayoutConfiguration
    
    @ParameterizedTest
    @ValueSource(doubles = [0.0, -5.0, -100.0])
    fun `should throw exception when amount is zero or negative`(amount: Double) {
        val payout = PayoutMother.of(amount = amount)
        
        val exception = shouldThrow<InvalidPayoutException> {
            validator.validate(payout)
        }
        exception.code shouldBe PayoutErrorCode.INVALID_AMOUNT
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Key Discoveries

✅ What AI Does Well

Fast implementation once architecture is defined
Comprehensive test case generation (almost too comprehensive)
Pattern recognition - can apply consistent patterns across similar classes
Refactoring assistance - good at mechanical code improvements

⚠️ What Needs Heavy Human Oversight

Architectural decisions - defaults to simplest (often wrong) approach
Separation of concerns - mixes responsibilities without prompting
Test strategy - over-tests simple scenarios, under-tests complex ones
Dependency management - avoids mocking, uses real dependencies

❌ What AI Struggles With

TDD discipline - wants to write everything at once
Minimal implementations - jumps to complete solutions immediately
Context management - loses track of current focus with complex requirements
Quality assessment - confident about objectively poor code

The Scalability Problem

The most concerning discovery: AI-led TDD doesn't scale with complexity.

Calculator (10 lines): Excellent TDD discipline
Payout Service (200+ lines): Required constant human intervention
Real application (1000+ lines): Would be unmanageable

AI seems to have a complexity threshold where its behavior changes fundamentally.

Lessons for VibeTDD

1. Classic TDD Must Be Adapted for AI

The one-test-at-a-time approach is incompatible with AI collaboration. VibeTDD Principle: Write small, focused sets of tests first, then implement together. This reduces context overhead while maintaining test-first discipline.

2. AI Amplifies Your Approach

If you don't provide structure and conventions, AI will create its own - and they won't be good.

3. Micro-Management is Required

With complex requirements, you need to break work into tiny, discrete chunks. AI can't maintain context across large feature implementations.

4. Architecture Must Be Human-Led

AI defaults to the simplest possible structure, which is rarely the right structure for maintainable software.

5. Testing Strategy Needs Curation

AI generates exhaustive tests rather than strategic tests. It doesn't understand the difference between essential coverage and paranoid over-testing.

The Verdict

VibeTDD Phase 2 was a humbling experience. While AI can certainly generate code that passes tests, it cannot maintain the discipline and architectural thinking that makes TDD valuable.

The real insight: TDD's value isn't just about having tests, it's about the thinking process that creates good design. AI can execute TDD mechanics but can't do TDD thinking.

Next: The Role Reversal

For Phase 3, I'm flipping the script completely. Instead of letting Claude lead, I'll drive the TDD process myself and use AI as an implementation assistant.

The hypothesis: If humans provide the discipline and architecture through test design, AI might be an excellent implementation partner.

Will Claude write better code when constrained by human-designed tests? Can TDD serve as quality guardrails for AI-generated code? Let's find out.

This experiment revealed the boundaries of AI-led development more clearly than I expected. The next phase will test whether human-led TDD can harness AI's speed while maintaining quality. Follow along with the VibeTDD roadmap to see how this evolves.

Code Repository

The complete code from this experiment is available at: VibeTDD Phase 2 Repository

VibeTDD Experiment 2: When AI Leads a Real TDD Challenge #

The Challenge: From Toy to Reality #

What Went Wrong: The Over-Engineering Begins #

Problem 1: Test Explosion #

Problem 2: The Object Mother Catastrophe #

Problem 3: Classic TDD is Impossible with AI #

Problem 4: Too Proactive (And Imprecise Instructions) #

Problem 5: Missing Engineering Fundamentals #

The Moment of Intervention #

What We Built (With Heavy Guidance) #

Domain Model #

Validator Interface #

Individual Validators #

Service Orchestration #

Proper Error Handling #

Clean Tests #

Key Discoveries #

✅ What AI Does Well #

⚠️ What Needs Heavy Human Oversight #

❌ What AI Struggles With #

The Scalability Problem #

Lessons for VibeTDD #

1. Classic TDD Must Be Adapted for AI #

2. AI Amplifies Your Approach #

3. Micro-Management is Required #

4. Architecture Must Be Human-Led #

5. Testing Strategy Needs Curation #

The Verdict #

Next: The Role Reversal #

Code Repository #

VibeTDD Experiment 2: When AI Leads a Real TDD Challenge

The Challenge: From Toy to Reality

What Went Wrong: The Over-Engineering Begins

Problem 1: Test Explosion

Problem 2: The Object Mother Catastrophe

Problem 3: Classic TDD is Impossible with AI

Problem 4: Too Proactive (And Imprecise Instructions)

Problem 5: Missing Engineering Fundamentals

The Moment of Intervention

What We Built (With Heavy Guidance)

Domain Model

Validator Interface

Individual Validators

Service Orchestration

Proper Error Handling

Clean Tests

Key Discoveries

✅ What AI Does Well

⚠️ What Needs Heavy Human Oversight

❌ What AI Struggles With

The Scalability Problem

Lessons for VibeTDD

1. Classic TDD Must Be Adapted for AI

2. AI Amplifies Your Approach

3. Micro-Management is Required

4. Architecture Must Be Human-Led

5. Testing Strategy Needs Curation

The Verdict

Next: The Role Reversal

Code Repository