VibeTDD Experiment 3: When Human Takes the Lead

This is Phase 3 of my VibeTDD series. After watching AI struggle with leading TDD in Phase 2 and witnessing the disaster of test-after development in Phase 2.1, it was time to flip the script.

The Setup: Human-Led TDD

After the previous experiments revealed that AI can't maintain TDD discipline on complex projects, I decided to test the inverse approach:

The New Rules:

I lead the TDD process and architectural decisions
Claude assists with implementation only
I provide comprehensive conventions upfront
I control the scope of each task
Tests first - always write tests before any implementation

The Hypothesis: If humans provide the discipline and architecture through test design, AI might be an excellent implementation partner while maintaining the quality benefits of TDD.

The Challenge: Same Portfo payout service, but this time following proper TDD conventions.

The Foundation: Conventions Born from Failure

Learning from Phase 2's chaos, I collaborated with Claude to create detailed TDD conventions based on what went wrong. Together, we documented:

SOLID principles with specific examples from our mistakes
Object Mother pattern to fix the test data disaster
Validator separation to avoid monolithic services
MockK usage with proper annotations
Error handling with typed exceptions and error codes
Method naming conventions for cleaner APIs

These weren't theoretical guidelines - they were battle-tested solutions to problems we'd actually encountered. The goal: Give AI a comprehensive framework born from real experience.

Early Wins: The Power of Constraints

Discovery 1: Even Clear Instructions Trigger Implementation Mode

My approach was careful and explicit. I provided the task description and conventions, then said:

"Don't write the code at this step."

Then I shared the Rand utility object, thinking Claude would simply acknowledge it for future use.

Instead, Claude immediately responded with complete implementations:

All domain classes with full logic
Every validator with business rules implemented
Complete service architecture
Comprehensive test suites for everything

The Problem: Even with explicit "don't code" instructions, sharing any code artifact triggered Claude's implementation mode. It saw the Rand utility and assumed it was time to build everything.

The Learning: AI interprets code sharing as "start building" regardless of explicit instructions to the contrary.

Discovery 2: Micro-Management is Essential

I adapted my approach to give extremely specific, limited instructions:

❌ What doesn't work:

"Implement the UserIdValidator following the conventions"

✅ What works:

"Write only the test cases for UserIdValidator. Don't implement anything yet."

This granular control kept Claude focused and allowed me to validate the test design before any implementation.

Discovery 3: AI Needs Constant Direction

Even with clear boundaries, Claude would occasionally ask:

"Should we proceed to implement the validation logic?"

I learned to be ruthlessly specific:

❌ Vague: "Continue"
❌ Scope creep: "Implement validation"
✅ Precise: "Implement only the UserIdValidator.validate() method"

The Architecture Emerges

The Clean Validator Pattern

Following proper TDD, we built individual validators:

kotlin

interface PayoutValidator {
    fun validate(payout: Payout)
}

class UserIdValidator : PayoutValidator {
    override fun validate(payout: Payout) {
        if (payout.userId.isEmpty()) {
            throw InvalidPayoutException(
                PayoutErrorCode.EMPTY_USER_ID,
                "UserId cannot be empty"
            )
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14

The Scalability Challenge

When Claude first suggested the service architecture, it proposed this approach:

kotlin

class PayoutService(
    private val payoutStorage: PayoutStorage,
    private val userIdValidator: PayoutValidator,
    private val currencyValidator: PayoutValidator,
    private val amountValidator: PayoutValidator,
    private val userTotalLimitValidator: PayoutValidator
) {
    fun process(payout: Payout) {
        userIdValidator.validate(payout)
        currencyValidator.validate(payout)
        amountValidator.validate(payout)
        userTotalLimitValidator.validate(payout)
        
        payoutStorage.store(payout)
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

I immediately pushed back:

"This doesn't scale. Imagine we have 100 validators."

Claude immediately offered better alternatives:

List of validators (most practical)
Composite validator pattern (more complex)
Chain of responsibility (over-engineering)

We chose the list approach:

kotlin

class PayoutService(
    private val payoutStorage: PayoutStorage,
    private val validators: List<PayoutValidator>
) {
    fun process(payout: Payout) {
        validators.forEach { it.validate(payout) }
        payoutStorage.store(payout)
    }
}

1
2
3
4
5
6
7
8
9

This demonstrated that AI can suggest good architectural patterns when questioned rather than defaulting to the simplest approach.

The Lost Navigator Problem

When I Hit the Strategic Wall

About halfway through the implementation, something unexpected happened: I got lost.

Despite leading the process, I found myself unsure what to implement next. The test-by-test approach meant I was deep in the details without a clear roadmap.

My solution was simple but revealing:

"What are the next steps we need to cover?"

Claude immediately provided a clear breakdown:

Implement storage layer with tests
Create configuration interface and default implementation
Write integration tests with real components

The Insight: Even when humans lead TDD, AI can serve as an excellent project navigator by maintaining awareness of the overall scope while humans focus on individual components. However, this worked well because our task was simple and well-defined. For complex, real-world projects, I suspect AI would get lost just as easily as humans do.

The Deeper Learning: We need to define clear project plans alongside conventions before starting any significant development work.

The Convention Misunderstandings

The Comment Rebellion

When I asked Claude to remove "useless comments," it went overboard and removed all comments, including helpful ones like Given-When-Then markers in tests.

kotlin

// Before (what I wanted to keep):
@Test
fun `should throw exception when UserId is empty`() {
    // Given
    val payout = PayoutMother.of(userId = "")
    
    // When & Then
    val exception = shouldThrow<InvalidPayoutException> {
        validator.validate(payout)
    }
    exception.code shouldBe PayoutErrorCode.EMPTY_USER_ID
}

// After (AI's interpretation):
@Test
fun `should throw exception when UserId is empty`() {
    val payout = PayoutMother.of(userId = "")
    
    val exception = shouldThrow<InvalidPayoutException> {
        validator.validate(payout)
    }
    exception.code shouldBe PayoutErrorCode.EMPTY_USER_ID
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

The Learning: AI interprets instructions literally. When you say "remove useless comments," be specific about what's useful vs. useless.

The Edge Case Blindness

Claude implemented currency validation correctly for the main case (invalid currency codes like "JPY") but missed a subtle edge case: malformed currency codes.

The validator should handle:

"JPY" → INVALID_CURRENCY (not in allowed list)
"INVALID" → INVALID_CURRENCY_FORMAT (not a valid currency code)

The Issue: Our specification wasn't clear enough about this distinction. AI implemented exactly what was specified, nothing more.

The Learning: Specifications need to be more detailed when working with AI, as it won't infer edge cases that humans might naturally consider.

The Integration Reality Check

The Configuration Amnesia

When we reached integration testing, Claude made a critical error: it forgot about the business rule configurations.

kotlin

// Integration test that should have failed:
@Test
fun `should reject payout when amount exceeds maximum`() {
    val payout = PayoutMother.of(amount = 30.01, currency = "EUR")
    
    val exception = shouldThrow<InvalidPayoutException> {
        payoutService.process(payout)
    }
    exception.code shouldBe PayoutErrorCode.INVALID_AMOUNT
}

1
2
3
4
5
6
7
8
9
10

This test failed because:

PayoutMother was generating random amounts that could exceed 30
The configuration limits weren't being respected in test data
Integration tests weren't using the actual business rules

The Fix Required:

Ensure test data respects actual business constraints
Use real configuration in integration tests
Verify that validators use configuration correctly

This revealed that AI has trouble connecting different parts of the system even when each part is implemented correctly in isolation.

The Final Architecture

After human-led development with AI assistance, we achieved:

Clean Separation of Concerns

kotlin

// Single responsibility validators
class AmountValidator(private val config: PayoutConfiguration) : PayoutValidator
class CurrencyValidator(private val config: PayoutConfiguration) : PayoutValidator
class UserTotalLimitValidator(private val config: PayoutConfiguration, private val storage: PayoutStorage) : PayoutValidator

// Simple orchestration service
class PayoutService(
    private val storage: PayoutStorage,
    private val validators: List<PayoutValidator>
) {
    fun process(payout: Payout) {
        validators.forEach { it.validate(payout) }
        storage.store(payout)
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Comprehensive Test Coverage

Unit tests for each validator in isolation
Service tests with mocked dependencies
Storage tests for data layer verification
Integration tests with real components

Configurable Business Rules

kotlin

class DefaultPayoutConfiguration : PayoutConfiguration {
    override fun getAllowedCurrencies(): Set<String> = setOf("EUR", "USD", "GBP")
    override fun getMaxAmount(): Double = 30.0
    override fun getMaxUserTotal(): Double = 100.0
}

1
2
3
4
5

Key Discoveries

✅ What Human-Led TDD + AI Does Well

Architecture Control: Human test design forces good separation of concerns
Implementation Speed: AI generates code quickly once boundaries are clear
Pattern Consistency: AI applies patterns consistently across similar components
Project Navigation: AI can provide helpful "what's next" guidance on simple, well-defined tasks
Planning Insight: For complex projects, we need clear plans defined upfront alongside conventions
Refactoring Assistance: AI handles mechanical code improvements well

⚠️ What Still Needs Human Oversight

Test Design Strategy: Deciding what to test and how to structure tests
Integration Coordination: Connecting different system parts coherently
Edge Case Identification: Recognizing subtle requirements not explicitly stated
Specification Clarity: Ensuring requirements are detailed enough for AI
Context Management: Maintaining awareness of the full system scope

❌ What Remains Challenging

Self-Direction: AI still can't maintain long-term direction without guidance
Implicit Requirements: Doesn't infer obvious but unstated requirements
System-Level Thinking: Struggles to connect components in integration scenarios
Convention Nuance: Takes instructions literally, misses contextual meaning

The Scalability Question

The big question: Does human-led TDD + AI scale to real projects?

Promising Signs:

Architecture remained clean throughout
Adding new validators was trivial
Test coverage was comprehensive and maintainable
Code quality was enterprise-level

Concerning Signs:

Required constant micro-management
Integration issues weren't caught until the end
Specification gaps caused implementation issues
Human bottlenecks in decision-making

Lessons for VibeTDD

1. The Control Paradox

Human-led TDD produces better results but requires more management, not less. You gain architectural control but lose the "AI automation" benefit.

2. Conventions + Constraints = Success

Conventions guide quality, but task constraints control scope. Both are essential for effective AI collaboration.

3. AI as Navigator, Not Driver

AI excels at answering "what's next?" but can't decide "what should we build?" The strategic thinking must remain human.

4. Integration is the Hardest Part

Individual components work well with human-led TDD, but connecting them coherently still requires significant human oversight.

5. Specification Precision Matters More

When AI implements exactly what you specify, your specifications need to be more precise than traditional development.

The Verdict

Human-led TDD with AI assistance is promising but not transformative. It produces higher quality code than AI-led approaches, but requires more upfront investment in:

Detailed conventions
Precise specifications
Constant task management
Integration oversight

The approach feels like having a very fast, very literal junior developer who needs clear instructions but executes them perfectly.

Next: The Real-World Test

Phase 3 proved that human-led TDD can work with AI assistance, but it was still a controlled experiment. For Phase 4, I'm going to apply these learnings to a more complex scenario:

Building a Notification Service that:

Subscribes to message queues for order status events
Retrieves data from multiple external services (Users, Orders)
Constructs and sends notifications via multiple channels (email, SMS, push)
Handles user preferences for notification types
Manages template-based message construction

The questions for Phase 4:

Do the patterns hold as complexity increases?
How does AI handle testing across different layers (unit, integration, contract)?
Can this approach maintain code quality with external dependencies?
Where are the breaking points with real-world service interactions?

The journey to discover sustainable AI-assisted development continues.

Phase 3 showed that human control can harness AI's implementation speed while maintaining quality, but it's not the effortless automation I had hoped for. The real test comes with scaling to complex, real-world scenarios. Follow the VibeTDD roadmap to see how this approach handles production-level challenges.

Code Repository

The complete code from this experiment is available at: VibeTDD Phase 3 Repository

VibeTDD Experiment 3: When Human Takes the Lead #

The Setup: Human-Led TDD #

The Foundation: Conventions Born from Failure #

Early Wins: The Power of Constraints #

Discovery 1: Even Clear Instructions Trigger Implementation Mode #

Discovery 2: Micro-Management is Essential #

Discovery 3: AI Needs Constant Direction #

The Architecture Emerges #

The Clean Validator Pattern #

The Scalability Challenge #

The Lost Navigator Problem #

When I Hit the Strategic Wall #

The Convention Misunderstandings #

The Comment Rebellion #

The Edge Case Blindness #

The Integration Reality Check #

The Configuration Amnesia #

The Final Architecture #

Clean Separation of Concerns #

Comprehensive Test Coverage #

Configurable Business Rules #

Key Discoveries #

✅ What Human-Led TDD + AI Does Well #

⚠️ What Still Needs Human Oversight #

❌ What Remains Challenging #

The Scalability Question #

Lessons for VibeTDD #

1. The Control Paradox #

2. Conventions + Constraints = Success #

3. AI as Navigator, Not Driver #

4. Integration is the Hardest Part #

5. Specification Precision Matters More #

The Verdict #

Next: The Real-World Test #

Code Repository #

VibeTDD Experiment 3: When Human Takes the Lead

The Setup: Human-Led TDD

The Foundation: Conventions Born from Failure

Early Wins: The Power of Constraints

Discovery 1: Even Clear Instructions Trigger Implementation Mode

Discovery 2: Micro-Management is Essential

Discovery 3: AI Needs Constant Direction

The Architecture Emerges

The Clean Validator Pattern

The Scalability Challenge

The Lost Navigator Problem

When I Hit the Strategic Wall

The Convention Misunderstandings

The Comment Rebellion

The Edge Case Blindness

The Integration Reality Check

The Configuration Amnesia

The Final Architecture

Clean Separation of Concerns

Comprehensive Test Coverage

Configurable Business Rules

Key Discoveries

✅ What Human-Led TDD + AI Does Well

⚠️ What Still Needs Human Oversight

❌ What Remains Challenging

The Scalability Question

Lessons for VibeTDD

1. The Control Paradox

2. Conventions + Constraints = Success

3. AI as Navigator, Not Driver

4. Integration is the Hardest Part

5. Specification Precision Matters More

The Verdict

Next: The Real-World Test

Code Repository