VibeTDD Lessons After 3 Phases: What Actually Works with AI

Some time ago, I started the VibeTDD experiment with a simple question: Can AI and Test-Driven Development work together effectively?

After testing three different approaches: AI-led development, human-led TDD, and the popular "test-after" pattern - I've discovered some surprising truths about the real boundaries and opportunities of AI-assisted coding.

Here's what I learned about what actually works, what definitely doesn't, and why the future of AI-assisted development looks very different from what most people expect.

The Three-Phase Journey

Phase 1: Can AI Teach TDD?

The Test: Let Claude lead a simple calculator project using TDD principles. The Result: Surprisingly successful for simple problems.

Phase 2: When AI Leads Complex TDD

The Test: Let Claude handle a real-world payout service with business rules and validation. The Result: Architectural disaster requiring constant human intervention.

Phase 2.1: The Test-After Trap

The Test: The popular "generate tests for existing code" approach. The Result: False confidence leading to unmaintainable code.

Phase 3: Human-Led TDD with AI Assistance

The Test: I control the TDD process, AI helps with implementation. The Result: Promising but requiring significant management overhead.

The Big Lessons: What Actually Works

1. AI Can Suggest Architecture But Won't Implement It by Default

The Discovery: AI knows good patterns but defaults to expedient solutions.

In Phase 2, when Claude first suggested service architecture, it proposed:

kotlin

class PayoutService(
    private val userIdValidator: PayoutValidator,
    private val currencyValidator: PayoutValidator,
    private val amountValidator: PayoutValidator,
    private val userTotalLimitValidator: PayoutValidator
) {
    // Individual validator dependencies don't scale
}

1
2
3
4
5
6
7
8

But when I pushed back with "This doesn't scale. Imagine we have 100 validators," Claude immediately offered better alternatives:

List of validators (most practical)
Composite validator pattern
Chain of responsibility

The Lesson: AI can suggest good architectural patterns when questioned, but you have to ask. It won't volunteer the best solution, it'll give you the working solution, just to cover this concrete case.

Action: Capture AI's architectural suggestions and add them to your conventions. Once documented, AI should follow them consistently.

2. Classic TDD is Impossible with AI (But VibeTDD Works)

The Logical Discovery: The traditional red-green-refactor cycle is fundamentally incompatible with AI collaboration.

Why Classic TDD Fails with AI:

Context Explosion: Each cycle adds conversation history
Memory Consumption: AI sessions blow up quickly
Time Overhead: Constant switching becomes prohibitively slow
Session Limits: You hit token limits before completing features

The VibeTDD Solution: Instead of one-test-at-a-time, write small, focused sets of related tests first, then implement them together.

❌ Classic: Write test → implement → write test → implement
✅ VibeTDD: Write focused test batch → verify they fail → implement together

1
2

Why This Works:

Reduces context switching overhead
Keeps AI sessions manageable
Maintains test-first discipline
Allows better validation of test completeness

The Impact: This single insight changed everything. VibeTDD batching makes AI collaboration practical for real projects.

3. Strict Prompts and Templates Are Essential

The Problem: AI interprets vague instructions creatively, leading to scope creep and over-engineering.

What Doesn't Work:

❌ "Continue with the next step"
❌ "Implement the validation"
❌ "Proceed"

What Works:

✅ "Write only the test cases for UserIdValidator. Don't implement anything yet."
✅ "Implement only the UserIdValidator.validate() method"
✅ "Show me the next single test case"

The Discovery: We need prompt templates for every common action:

Writing unit tests
Writing integration tests
Implementation tasks
Debugging issues
Code reviews
Architecture analysis

That means I have to discover how to efficiently prepare prompt templates.

4. Conventions Must Be Modular, Not Monolithic

The Problem: Our initial approach created one massive file covering everything:

Basic TDD conventions
Language-specific (Kotlin/Java/JavaScript)
Domain-specific (Backend/Frontend/Mobile)
Patterns library
Error handling standards
Framework-specific (Spring Boot/Vue/React)

Why This Failed:

Language confusion: AI mixed Kotlin patterns with JavaScript syntax
Context overload: Irrelevant sections confused AI
Maintenance nightmare: Updating one concept required changing multiple sections
Team overwhelm: 2,000+ lines were impossible to navigate

The Solution: Modular convention structure:

- Basic TDD Conventions          # Language-agnostic principles
- Kotlin Testing Conventions   # Language-specific practices  
- Backend Conventions          # Domain-specific patterns
- Patterns Library            # Reusable solutions
- Error Handling              # Error management strategies
- Framework Conventions       # How to use a specific framework

1
2
3
4
5
6

Benefits:

Teams compose exactly what they need
AI gets precise, relevant context
Updates don't cascade across unrelated areas
Clear ownership boundaries

5. Project Plans Are as Important as Conventions

The Insight: Even with human-led TDD, I got lost halfway through implementation.

Despite leading the process, I found myself unsure what to implement next. The test-by-test approach meant I was deep in details without a clear roadmap.

When I asked Claude "What are the next steps we need to cover?" it immediately provided:

Implement storage layer with tests
Create configuration interface
Write integration tests with real components

The Learning: AI can serve as an excellent project navigator for well-defined tasks, but we need clear project plans alongside conventions before starting significant development.

What This Means:

Define scope and architecture upfront
Break work into clear phases
Maintain task lists and progress tracking
Use AI for navigation, not strategy

The Hard Truths: What Doesn't Work

1. AI-Led Development Doesn't Scale Past Toy Problems

Phase 1 (Calculator): Excellent TDD discipline, clean implementation Phase 2 (Payout Service): Required constant human intervention, architectural chaos

The Pattern: AI has a complexity threshold where its behavior changes fundamentally. Simple, well-understood problems work great. Real-world complexity breaks down quickly.

2. The Test-After Approach Is an Anti-Pattern

The Popular Myth: "Write code first, then ask AI to generate comprehensive tests."

The Reality: This creates:

Tests that lock in current implementation
False confidence from high coverage metrics
Unmaintainable code that's afraid to change
Tests that pass but don't prevent regressions

Phase 2.1 proved this conclusively: AI generates tests that look professional but are fundamentally flawed from an architecture perspective.

3. AI Doesn't Self-Optimize or Learn

Unlike human developers who learn from mistakes, AI repeats patterns without improvement. Every session starts fresh, and without explicit constraints, AI will:

Over-engineer simple solutions
Mix responsibilities in service classes
Create unnecessary abstractions
Generate verbose, defensive code

The Implication: Quality comes from human-designed constraints, not AI intelligence.

4. Integration Is the Hardest Part

Individual components work well with AI assistance, but connecting them coherently requires significant human oversight. AI struggles with:

System-level thinking
Component interaction patterns
Configuration flow between layers
End-to-end workflow design

The Scalability Question

Does any of this scale to real projects?

Based on three phases of experimentation:

✅ What Scales:

VibeTDD batching for individual components
Human-led architecture with AI implementation assistance
Modular conventions that provide focused guidance
Prompt templates that reduce micro-management

⚠️ What Needs More Testing:

Complex multi-service applications
Legacy system refactoring
Performance-critical implementations
Real-time system development

❌ What Doesn't Scale:

AI-led architectural decisions
Test-after development approaches
Monolithic convention systems
Unstructured AI collaboration

The Emerging Pattern: AI as Intelligent Assistant

The Insight: The most effective approach isn't "AI replaces developers" or "AI is just autocomplete." It's AI as intelligent assistant with clear boundaries.

Human Responsibilities:

Strategic thinking and architecture design
Test design and validation strategy
Integration oversight and system thinking
Quality control and pattern refinement
Project planning and scope management

AI Responsibilities:

Implementation within defined constraints
Pattern application across similar components
Mechanical refactoring and code generation
Project navigation for well-defined tasks
Comprehensive test case generation

Shared Artifacts:

Modular convention libraries
Prompt templates for common tasks
Architectural pattern collections
Quality assessment frameworks

What This Means for Your Team

If You're Just Starting with AI-Assisted Development:

Start with VibeTDD batching - forget classic red-green-refactor
Create focused conventions - don't try to document everything at once
Use prompt templates - structure your AI interactions
Maintain human control - AI assists, humans lead

If You're Already Using AI for Coding:

Audit your approach - are you falling into the test-after trap?
Check your conventions - are they modular and focused?
Review your prompts - are they specific enough?
Assess your architecture - is AI making design decisions?

If You're Leading a Development Team:

Invest in convention development - this is infrastructure, not overhead
Train on VibeTDD patterns - classic TDD training won't work
Create prompt libraries - reduce the learning curve
Establish quality gates - AI needs guardrails

The Meta-Learning

Perhaps the most important insight from three phases: VibeTDD isn't just about combining AI with TDD - it's about creating a new development methodology that acknowledges AI's strengths and constraints while maintaining software quality.

We're not just learning to code with AI. We're learning to think systematically about sustainable software development in an AI-augmented world.

These lessons form the foundation for Phase 4 and the upcoming VibeTDD Knowledge Base. The goal isn't to prove that AI can replace developers, but to discover how AI can make good developers more effective while maintaining the discipline and quality that sustainable software requires.

Next up: How I solved the monolithic convention problem and built a modular knowledge base that actually scales.

VibeTDD Lessons After 3 Phases: What Actually Works with AI #

The Three-Phase Journey #

Phase 1: Can AI Teach TDD? #

Phase 2: When AI Leads Complex TDD #

Phase 2.1: The Test-After Trap #

Phase 3: Human-Led TDD with AI Assistance #

The Big Lessons: What Actually Works #

1. AI Can Suggest Architecture But Won't Implement It by Default #

2. Classic TDD is Impossible with AI (But VibeTDD Works) #

3. Strict Prompts and Templates Are Essential #

4. Conventions Must Be Modular, Not Monolithic #

5. Project Plans Are as Important as Conventions #

The Hard Truths: What Doesn't Work #

1. AI-Led Development Doesn't Scale Past Toy Problems #

2. The Test-After Approach Is an Anti-Pattern #

3. AI Doesn't Self-Optimize or Learn #

4. Integration Is the Hardest Part #

The Scalability Question #

✅ What Scales: #

⚠️ What Needs More Testing: #

❌ What Doesn't Scale: #

The Emerging Pattern: AI as Intelligent Assistant #

Human Responsibilities: #

AI Responsibilities: #

Shared Artifacts: #

What This Means for Your Team #

If You're Just Starting with AI-Assisted Development: #

If You're Already Using AI for Coding: #

If You're Leading a Development Team: #

The Meta-Learning #