VibeTDD Experiment 3: When Human Takes the Lead
This is Phase 3 of my VibeTDD series. After watching AI struggle with leading TDD in Phase 2 and witnessing the disaster of test-after development in Phase 2.1, it was time to flip the script.
The Setup: Human-Led TDD
After the previous experiments revealed that AI can't maintain TDD discipline on complex projects, I decided to test the inverse approach:
The New Rules:
- I lead the TDD process and architectural decisions
- Claude assists with implementation only
- I provide comprehensive conventions upfront
- I control the scope of each task
- Tests first - always write tests before any implementation
The Hypothesis: If humans provide the discipline and architecture through test design, AI might be an excellent implementation partner while maintaining the quality benefits of TDD.
The Challenge: Same Portfo payout service, but this time following proper TDD conventions.
The Foundation: Conventions Born from Failure
Learning from Phase 2's chaos, I collaborated with Claude to create detailed TDD conventions based on what went wrong. Together, we documented:
- SOLID principles with specific examples from our mistakes
- Object Mother pattern to fix the test data disaster
- Validator separation to avoid monolithic services
- MockK usage with proper annotations
- Error handling with typed exceptions and error codes
- Method naming conventions for cleaner APIs
These weren't theoretical guidelines - they were battle-tested solutions to problems we'd actually encountered. The goal: Give AI a comprehensive framework born from real experience.
Early Wins: The Power of Constraints
Discovery 1: Even Clear Instructions Trigger Implementation Mode
My approach was careful and explicit. I provided the task description and conventions, then said:
"Don't write the code at this step."
Then I shared the Rand
utility object, thinking Claude would simply acknowledge it for future use.
Instead, Claude immediately responded with complete implementations:
- All domain classes with full logic
- Every validator with business rules implemented
- Complete service architecture
- Comprehensive test suites for everything
The Problem: Even with explicit "don't code" instructions, sharing any code artifact triggered Claude's implementation mode. It saw the Rand
utility and assumed it was time to build everything.
The Learning: AI interprets code sharing as "start building" regardless of explicit instructions to the contrary.
Discovery 2: Micro-Management is Essential
I adapted my approach to give extremely specific, limited instructions:
❌ What doesn't work:
"Implement the UserIdValidator following the conventions"
✅ What works:
"Write only the test cases for UserIdValidator. Don't implement anything yet."
This granular control kept Claude focused and allowed me to validate the test design before any implementation.
Discovery 3: AI Needs Constant Direction
Even with clear boundaries, Claude would occasionally ask:
"Should we proceed to implement the validation logic?"
I learned to be ruthlessly specific:
❌ Vague: "Continue"
❌ Scope creep: "Implement validation"
✅ Precise: "Implement only the UserIdValidator.validate() method"
The Architecture Emerges
The Clean Validator Pattern
Following proper TDD, we built individual validators:
interface PayoutValidator {
fun validate(payout: Payout)
}
class UserIdValidator : PayoutValidator {
override fun validate(payout: Payout) {
if (payout.userId.isEmpty()) {
throw InvalidPayoutException(
PayoutErrorCode.EMPTY_USER_ID,
"UserId cannot be empty"
)
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
The Scalability Challenge
When Claude first suggested the service architecture, it proposed this approach:
class PayoutService(
private val payoutStorage: PayoutStorage,
private val userIdValidator: PayoutValidator,
private val currencyValidator: PayoutValidator,
private val amountValidator: PayoutValidator,
private val userTotalLimitValidator: PayoutValidator
) {
fun process(payout: Payout) {
userIdValidator.validate(payout)
currencyValidator.validate(payout)
amountValidator.validate(payout)
userTotalLimitValidator.validate(payout)
payoutStorage.store(payout)
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
I immediately pushed back:
"This doesn't scale. Imagine we have 100 validators."
Claude immediately offered better alternatives:
- List of validators (most practical)
- Composite validator pattern (more complex)
- Chain of responsibility (over-engineering)
We chose the list approach:
class PayoutService(
private val payoutStorage: PayoutStorage,
private val validators: List<PayoutValidator>
) {
fun process(payout: Payout) {
validators.forEach { it.validate(payout) }
payoutStorage.store(payout)
}
}
2
3
4
5
6
7
8
9
This demonstrated that AI can suggest good architectural patterns when questioned rather than defaulting to the simplest approach.
The Lost Navigator Problem
When I Hit the Strategic Wall
About halfway through the implementation, something unexpected happened: I got lost.
Despite leading the process, I found myself unsure what to implement next. The test-by-test approach meant I was deep in the details without a clear roadmap.
My solution was simple but revealing:
"What are the next steps we need to cover?"
Claude immediately provided a clear breakdown:
- Implement storage layer with tests
- Create configuration interface and default implementation
- Write integration tests with real components
The Insight: Even when humans lead TDD, AI can serve as an excellent project navigator by maintaining awareness of the overall scope while humans focus on individual components. However, this worked well because our task was simple and well-defined. For complex, real-world projects, I suspect AI would get lost just as easily as humans do.
The Deeper Learning: We need to define clear project plans alongside conventions before starting any significant development work.
The Convention Misunderstandings
The Comment Rebellion
When I asked Claude to remove "useless comments," it went overboard and removed all comments, including helpful ones like Given-When-Then markers in tests.
// Before (what I wanted to keep):
@Test
fun `should throw exception when UserId is empty`() {
// Given
val payout = PayoutMother.of(userId = "")
// When & Then
val exception = shouldThrow<InvalidPayoutException> {
validator.validate(payout)
}
exception.code shouldBe PayoutErrorCode.EMPTY_USER_ID
}
// After (AI's interpretation):
@Test
fun `should throw exception when UserId is empty`() {
val payout = PayoutMother.of(userId = "")
val exception = shouldThrow<InvalidPayoutException> {
validator.validate(payout)
}
exception.code shouldBe PayoutErrorCode.EMPTY_USER_ID
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
The Learning: AI interprets instructions literally. When you say "remove useless comments," be specific about what's useful vs. useless.
The Edge Case Blindness
Claude implemented currency validation correctly for the main case (invalid currency codes like "JPY") but missed a subtle edge case: malformed currency codes.
The validator should handle:
"JPY"
→INVALID_CURRENCY
(not in allowed list)"INVALID"
→INVALID_CURRENCY_FORMAT
(not a valid currency code)
The Issue: Our specification wasn't clear enough about this distinction. AI implemented exactly what was specified, nothing more.
The Learning: Specifications need to be more detailed when working with AI, as it won't infer edge cases that humans might naturally consider.
The Integration Reality Check
The Configuration Amnesia
When we reached integration testing, Claude made a critical error: it forgot about the business rule configurations.
// Integration test that should have failed:
@Test
fun `should reject payout when amount exceeds maximum`() {
val payout = PayoutMother.of(amount = 30.01, currency = "EUR")
val exception = shouldThrow<InvalidPayoutException> {
payoutService.process(payout)
}
exception.code shouldBe PayoutErrorCode.INVALID_AMOUNT
}
2
3
4
5
6
7
8
9
10
This test failed because:
PayoutMother
was generating random amounts that could exceed 30- The configuration limits weren't being respected in test data
- Integration tests weren't using the actual business rules
The Fix Required:
- Ensure test data respects actual business constraints
- Use real configuration in integration tests
- Verify that validators use configuration correctly
This revealed that AI has trouble connecting different parts of the system even when each part is implemented correctly in isolation.
The Final Architecture
After human-led development with AI assistance, we achieved:
Clean Separation of Concerns
// Single responsibility validators
class AmountValidator(private val config: PayoutConfiguration) : PayoutValidator
class CurrencyValidator(private val config: PayoutConfiguration) : PayoutValidator
class UserTotalLimitValidator(private val config: PayoutConfiguration, private val storage: PayoutStorage) : PayoutValidator
// Simple orchestration service
class PayoutService(
private val storage: PayoutStorage,
private val validators: List<PayoutValidator>
) {
fun process(payout: Payout) {
validators.forEach { it.validate(payout) }
storage.store(payout)
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Comprehensive Test Coverage
- Unit tests for each validator in isolation
- Service tests with mocked dependencies
- Storage tests for data layer verification
- Integration tests with real components
Configurable Business Rules
class DefaultPayoutConfiguration : PayoutConfiguration {
override fun getAllowedCurrencies(): Set<String> = setOf("EUR", "USD", "GBP")
override fun getMaxAmount(): Double = 30.0
override fun getMaxUserTotal(): Double = 100.0
}
2
3
4
5
Key Discoveries
✅ What Human-Led TDD + AI Does Well
- Architecture Control: Human test design forces good separation of concerns
- Implementation Speed: AI generates code quickly once boundaries are clear
- Pattern Consistency: AI applies patterns consistently across similar components
- Project Navigation: AI can provide helpful "what's next" guidance on simple, well-defined tasks
- Planning Insight: For complex projects, we need clear plans defined upfront alongside conventions
- Refactoring Assistance: AI handles mechanical code improvements well
⚠️ What Still Needs Human Oversight
- Test Design Strategy: Deciding what to test and how to structure tests
- Integration Coordination: Connecting different system parts coherently
- Edge Case Identification: Recognizing subtle requirements not explicitly stated
- Specification Clarity: Ensuring requirements are detailed enough for AI
- Context Management: Maintaining awareness of the full system scope
❌ What Remains Challenging
- Self-Direction: AI still can't maintain long-term direction without guidance
- Implicit Requirements: Doesn't infer obvious but unstated requirements
- System-Level Thinking: Struggles to connect components in integration scenarios
- Convention Nuance: Takes instructions literally, misses contextual meaning
The Scalability Question
The big question: Does human-led TDD + AI scale to real projects?
Promising Signs:
- Architecture remained clean throughout
- Adding new validators was trivial
- Test coverage was comprehensive and maintainable
- Code quality was enterprise-level
Concerning Signs:
- Required constant micro-management
- Integration issues weren't caught until the end
- Specification gaps caused implementation issues
- Human bottlenecks in decision-making
Lessons for VibeTDD
1. The Control Paradox
Human-led TDD produces better results but requires more management, not less. You gain architectural control but lose the "AI automation" benefit.
2. Conventions + Constraints = Success
Conventions guide quality, but task constraints control scope. Both are essential for effective AI collaboration.
3. AI as Navigator, Not Driver
AI excels at answering "what's next?" but can't decide "what should we build?" The strategic thinking must remain human.
4. Integration is the Hardest Part
Individual components work well with human-led TDD, but connecting them coherently still requires significant human oversight.
5. Specification Precision Matters More
When AI implements exactly what you specify, your specifications need to be more precise than traditional development.
The Verdict
Human-led TDD with AI assistance is promising but not transformative. It produces higher quality code than AI-led approaches, but requires more upfront investment in:
- Detailed conventions
- Precise specifications
- Constant task management
- Integration oversight
The approach feels like having a very fast, very literal junior developer who needs clear instructions but executes them perfectly.
Next: The Real-World Test
Phase 3 proved that human-led TDD can work with AI assistance, but it was still a controlled experiment. For Phase 4, I'm going to apply these learnings to a more complex scenario:
Building a Notification Service that:
- Subscribes to message queues for order status events
- Retrieves data from multiple external services (Users, Orders)
- Constructs and sends notifications via multiple channels (email, SMS, push)
- Handles user preferences for notification types
- Manages template-based message construction
The questions for Phase 4:
- Do the patterns hold as complexity increases?
- How does AI handle testing across different layers (unit, integration, contract)?
- Can this approach maintain code quality with external dependencies?
- Where are the breaking points with real-world service interactions?
The journey to discover sustainable AI-assisted development continues.
Phase 3 showed that human control can harness AI's implementation speed while maintaining quality, but it's not the effortless automation I had hoped for. The real test comes with scaling to complex, real-world scenarios. Follow the VibeTDD roadmap to see how this approach handles production-level challenges.
Code Repository
The complete code from this experiment is available at: VibeTDD Phase 3 Repository