Skip to content
On this page

VibeTDD Experiment 2.1: The Test-After Trap - When AI 'Covers' Existing Code

This is Phase 2.1 of my VibeTDD series - an unplanned experiment that emerged from a common claim I keep hearing in the AI development community.

Before moving to Phase 3 where I'd take control of the TDD process, I kept encountering this argument:

"Why do TDD when AI can generate both code AND tests? We write the logic first, then ask AI to create comprehensive test coverage. It works great!"

I've heard this from multiple developers who swear by the approach. They claim AI generates thorough tests that catch bugs and provide good coverage. But something felt off about this.

My doubt: Are we sure AI covers everything properly, or does it just adapt tests to whatever code exists? How do you validate that generated tests are actually testing the right things? What happens when you need to modify the logic?

Time for an experiment.

The Setup: Testing the Test-After Approach

I decided to implement the enhanced payout service from Phase 2 using the "code-first, tests-after" methodology that many developers advocate.

The Rules:

  • Implement the complete feature first (no tests)
  • Ask Claude to generate comprehensive test coverage
  • See what kind of tests AI creates for existing code
  • Add a new requirement and observe how it handles changes

The Requirements (same as Phase 2):

  • Validate payout data (UserId, Amount, Currency)
  • Amount must not exceed 30
  • Only EUR, USD, GBP currencies allowed
  • Sum of all user payouts must not exceed 100
  • Store valid payouts in memory

What Claude Built: The Monolithic Approach

The "Complete" Implementation

Claude confidently created this service:

kotlin
class PayoutService(private val storage: PayoutStorage) {
    
    companion object {
        private val MAX_PAYOUT_AMOUNT = BigDecimal("30")
        private val MAX_USER_TOTAL = BigDecimal("100")
        private val ALLOWED_CURRENCIES = setOf(Currency.EUR, Currency.USD, Currency.GBP)
    }
    
    fun processPayout(userId: String?, amount: BigDecimal?, currency: Currency?): PayoutResult {
        return try {
            val validatedPayout = validateAndCreatePayout(userId, amount, currency)
            storage.store(validatedPayout)
            PayoutResult.Success(validatedPayout)
        } catch (e: PayoutValidationException) {
            PayoutResult.Failure(e.error)
        }
    }
    
    private fun validateAndCreatePayout(userId: String?, amount: BigDecimal?, currency: Currency?): Payout {
        // Validate required fields
        if (userId.isNullOrBlank()) {
            throw PayoutValidationException(PayoutError.MissingUserId)
        }
        
        if (amount == null) {
            throw PayoutValidationException(PayoutError.MissingAmount)
        }
        
        if (currency == null) {
            throw PayoutValidationException(PayoutError.MissingCurrency)
        }
        
        // Validate amount constraints
        if (amount <= BigDecimal.ZERO || amount > MAX_PAYOUT_AMOUNT) {
            throw PayoutValidationException(PayoutError.InvalidAmount)
        }
        
        // Validate currency
        if (currency !in ALLOWED_CURRENCIES) {
            throw PayoutValidationException(PayoutError.InvalidCurrency)
        }
        
        // Check user total limit
        val currentUserTotal = storage.getTotalPayoutByUserId(userId)
        if (currentUserTotal + amount > MAX_USER_TOTAL) {
            throw PayoutValidationException(
                PayoutError.UserLimitExceeded(currentUserTotal, MAX_USER_TOTAL)
            )
        }
        
        return Payout(userId = userId, amount = amount, currency = currency)
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

Red Flags Immediately Obvious:

  • Hardcoded business rules (MAX_PAYOUT_AMOUNT, ALLOWED_CURRENCIES)
  • Multiple responsibilities in one method (validation + business logic)
  • Impossible to test in isolation - every test needs real storage
  • No separation of concerns - adding new validation rules means modifying core logic

But I wanted to see what tests AI would generate for this mess.

The Test Generation Disaster

Problem 1: Configuration Chaos

My first attempt to run the tests failed immediately:

bash
[ERROR] TestEngine with ID 'junit-jupiter' failed to discover tests
[ERROR] There was an error in the forked process
1
2

When I provided this error to Claude, its response was shocking:

"This is likely due to missing dependencies or configuration issues. Let me implement tests in Java + add a manual runner so you can run it if tests still won't work using JUnit."

Wait, what? Instead of fixing the Maven configuration, Claude:

  1. Switched from Kotlin to Java for tests (defeating the purpose)
  2. Created a manual test runner using main() methods
  3. Suggested bypassing the testing framework entirely

This immediately revealed a fundamental problem: AI doesn't understand that broken infrastructure needs to be fixed, not worked around.

Problem 2: Shotgun Testing

Once I forced Claude to fix the configuration properly, it generated this test class:

kotlin
class PayoutServiceTest {
    
    @Test
    fun `should process valid payout successfully`() { /* basic test */ }
    
    @Test
    fun `should fail when userId is null`() { /* null test */ }
    
    @Test
    fun `should fail when amount exceeds 30`() { /* boundary test */ }
    
    @Test
    fun `should allow all supported currencies`() {
        // Test EUR
        var result = payoutService.processPayout("user1", BigDecimal("10"), Currency.EUR)
        assertTrue(result is PayoutResult.Success)
        
        // Test USD
        result = payoutService.processPayout("user2", BigDecimal("10"), Currency.USD)
        assertTrue(result is PayoutResult.Success)
        
        // Test GBP
        result = payoutService.processPayout("user3", BigDecimal("10"), Currency.GBP)
        assertTrue(result is PayoutResult.Success)
    }
    
    @Test
    fun `should track payouts separately for different users`() { /* ... */ }
    
    @Test
    fun `should fail when user total would exceed 100`() { /* ... */ }
    
    // ... 15 more similar tests
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Problems with this approach:

  • Shotgun testing: One massive test class trying to cover everything
  • Inefficient coverage: Tests like "allow all supported currencies" would need massive changes if we added more currencies
  • No isolation: Every test depends on the monolithic service
  • Impossible to maintain: Adding validation rules requires updating dozens of tests

Problem 3: The False Confidence

The most dangerous part was Claude's confidence:

"These tests provide comprehensive coverage of all validation scenarios and edge cases. The test suite ensures the service behaves correctly across all supported operations."

But when I looked closer:

  • Tests were testing the implementation, not behavior
  • No separation between different types of validation
  • Impossible to test individual business rules in isolation
  • Changes to any validation rule would break multiple tests

The Change Request: Adding Currency Restrictions

Now came the real test. I added a new requirement:

"Restrict specific users to use only certain currencies (e.g., User A can only use EUR)"

Claude's "Solution"

As expected, Claude made changes throughout the existing codebase:

Updated Service (now even messier):

kotlin
class PayoutService(
    private val storage: PayoutStorage,
    private val currencyRestrictions: CurrencyRestrictions? = null
) {
    
    private fun validateAndCreatePayout(userId: String?, amount: BigDecimal?, currency: Currency?): Payout {
        // ... existing validation logic ...
        
        // NEW: Validate user-specific currency restrictions
        currencyRestrictions?.let { restrictions ->
            if (!restrictions.isCurrencyAllowed(userId, currency)) {
                val allowedCurrencies = restrictions.getAllowedCurrencies(userId) ?: ALLOWED_CURRENCIES
                throw PayoutValidationException(
                    PayoutError.CurrencyNotAllowedForUser(userId, currency, allowedCurrencies)
                )
            }
        }
        
        // ... rest of validation ...
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

The Problems Multiplied:

  • Even more responsibilities in the same method
  • Optional dependencies making testing complex
  • Validation order matters but isn't explicit
  • Configuration scattered across multiple places

The Test Impact Explosion

Adding this single feature required changes to:

  • 8 existing test methods (had to mock new dependency)
  • 12 new test methods for currency restrictions
  • Complex test setup with multiple mocks
  • Parameterized tests that became unwieldy

Example of the resulting test complexity:

kotlin
@ExtendWith(MockKExtension::class)
class PayoutServiceTest {
    
    @InjectMockKs
    private lateinit var payoutService: PayoutService
    
    @MockK
    private lateinit var storage: PayoutStorage
    
    @MockK
    private lateinit var currencyRestrictions: CurrencyRestrictions
    
    @Test
    fun `should reject payout when currency is not in user's allowed list`() {
        // Given
        every { currencyRestrictions.isCurrencyAllowed("user123", Currency.USD) } returns false
        every { currencyRestrictions.getAllowedCurrencies("user123") } returns setOf(Currency.EUR)
        
        // When
        val result = payoutService.processPayout("user123", BigDecimal("10"), Currency.USD)
        
        // Then
        assertTrue(result is PayoutResult.Failure)
        val error = (result as PayoutResult.Failure).error
        assertTrue(error is PayoutError.CurrencyNotAllowedForUser)
    }
    
    // ... 30 more tests, each with complex mock setup
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

The Damning Discoveries

Discovery 1: AI Doesn't Test Behavior, It Tests Implementation

The generated tests were tightly coupled to the implementation details. They tested:

  • How validation was implemented
  • What order validations ran in
  • Which exceptions were thrown where

Instead of testing:

  • What business rules should be enforced
  • When those rules should apply
  • Why certain inputs should be valid/invalid

Discovery 2: Test Maintenance Becomes a Nightmare

Every change to business logic required:

  1. Updating multiple test methods (no clear separation)
  2. Modifying mock setups across dozens of tests
  3. Reorganizing test data to match new implementation
  4. Debugging test failures caused by implementation changes, not requirement changes

Discovery 3: False Coverage Confidence

The test coverage metrics looked great:

  • 95% line coverage
  • All branches tested
  • Comprehensive edge case scenarios

But the tests provided zero confidence for refactoring or changing business rules because they were testing implementation, not behavior.

Discovery 4: AI Creates Tests That Look Right

This was the most insidious problem. The generated tests looked professional:

  • Good naming conventions
  • Proper test structure
  • Comprehensive scenarios
  • Clean assertions

But they were fundamentally flawed from an architecture perspective.

The Comparison: What TDD Would Have Produced

If I had followed proper TDD (like the conventions from my Phase 2 learnings), I would have:

Separate validators:

kotlin
interface PayoutValidator {
    fun validate(payout: Payout)
}

class AmountValidator(private val config: PayoutConfiguration) : PayoutValidator
class CurrencyValidator(private val config: PayoutConfiguration) : PayoutValidator  
class UserLimitValidator(private val storage: PayoutStorage, private val config: PayoutConfiguration) : PayoutValidator
class CurrencyRestrictionValidator(private val restrictions: CurrencyRestrictions) : PayoutValidator
1
2
3
4
5
6
7
8

Clean service orchestration:

kotlin
class PayoutService(
    private val storage: PayoutStorage,
    private val validators: List<PayoutValidator>
) {
    fun process(payout: Payout) {
        validators.forEach { it.validate(payout) }
        storage.store(payout)
    }
}
1
2
3
4
5
6
7
8
9

Focused, maintainable tests:

kotlin
class AmountValidatorTest {
    @Test
    fun `should throw exception when amount exceeds configured limit`() {
        every { config.getMaxAmount() } returns 30.0
        val payout = PayoutMother.of(amount = 35.0)
        
        val exception = shouldThrow<ValidationException> {
            validator.validate(payout)
        }
        exception.code shouldBe AMOUNT_EXCEEDED
    }
}
1
2
3
4
5
6
7
8
9
10
11
12

Adding currency restrictions would have required:

  • One new validator class
  • One new test class
  • Zero changes to existing code
  • Zero changes to existing tests

The Verdict: Test-After is an Anti-Pattern

The "generate tests for existing code" approach is fundamentally flawed because:

❌ It Encourages Poor Design

  • Code written without tests tends toward monolithic structures
  • No pressure to create testable, modular components
  • Business logic gets mixed with infrastructure concerns

❌ Tests Become Implementation-Dependent

  • Generated tests lock in current implementation
  • Refactoring becomes impossible without rewriting tests
  • Changes cascade through multiple test methods

❌ False Confidence in Coverage

  • High coverage metrics don't mean good tests
  • Tests pass but don't prevent regressions
  • Missing edge cases aren't obvious

❌ Maintenance Nightmare

  • Every feature addition requires updating multiple tests
  • Test failures don't indicate requirement violations
  • Debugging test issues becomes as complex as debugging production code

❌ AI Amplifies Anti-Patterns

  • AI creates tests that look comprehensive but aren't
  • No architectural pressure to write better code
  • Quick feedback loop creates false sense of quality

Key Insights for VibeTDD

This experiment reinforced why test-first is crucial when working with AI:

  1. Tests as Design Pressure: Writing tests first forces you to think about interfaces and separation of concerns
  2. Behavior Over Implementation: TDD focuses on what the code should do, not how it does it
  3. Incremental Validation: Each test validates one specific behavior in isolation
  4. Refactoring Safety: Well-designed tests enable confident refactoring
  5. AI Needs Constraints: Without test-driven constraints, AI defaults to expedient but unmaintainable solutions

The Pattern Recognition

I'm starting to see a clear pattern across all VibeTDD experiments:

  • Phase 1 (Calculator): Simple problem → AI TDD works well
  • Phase 2 (Complex TDD): Complex problem → AI TDD breaks down
  • Phase 2.1 (Test-After): Any complexity + test-after → Disaster

The conclusion is becoming clear: AI needs the discipline that TDD provides, but can't provide that discipline itself.

Next: Taking Control

Phase 2.1 confirmed my suspicions about the test-after approach. It's time for Phase 3: Human-led TDD with AI as implementation assistant.

The hypothesis: If I provide the architectural discipline through test-first design, can AI serve as an effective code generation tool while maintaining quality?

Let's find out if the test-first approach can harness AI's speed while avoiding the architectural disasters I've witnessed so far.


This experiment was eye-opening about how dangerous the "AI generates tests for existing code" approach really is. The code looks good, the tests pass, but the foundation is rotten. Next up: testing whether human-led TDD can keep AI on the right path. Follow the VibeTDD roadmap for the complete journey.

Code Repository

The complete code from this experiment is available at: VibeTDD Phase 2.1 Repository

Built by software engineer for engineers )))