Skip to content
On this page

VibeTDD Lessons After 3 Phases: What Actually Works with AI

Some time ago, I started the VibeTDD experiment with a simple question: Can AI and Test-Driven Development work together effectively?

After testing three different approaches: AI-led development, human-led TDD, and the popular "test-after" pattern - I've discovered some surprising truths about the real boundaries and opportunities of AI-assisted coding.

Here's what I learned about what actually works, what definitely doesn't, and why the future of AI-assisted development looks very different from what most people expect.

The Three-Phase Journey

Phase 1: Can AI Teach TDD?

The Test: Let Claude lead a simple calculator project using TDD principles. The Result: Surprisingly successful for simple problems.

Phase 2: When AI Leads Complex TDD

The Test: Let Claude handle a real-world payout service with business rules and validation. The Result: Architectural disaster requiring constant human intervention.

Phase 2.1: The Test-After Trap

The Test: The popular "generate tests for existing code" approach. The Result: False confidence leading to unmaintainable code.

Phase 3: Human-Led TDD with AI Assistance

The Test: I control the TDD process, AI helps with implementation. The Result: Promising but requiring significant management overhead.

The Big Lessons: What Actually Works

1. AI Can Suggest Architecture But Won't Implement It by Default

The Discovery: AI knows good patterns but defaults to expedient solutions.

In Phase 2, when Claude first suggested service architecture, it proposed:

kotlin
class PayoutService(
    private val userIdValidator: PayoutValidator,
    private val currencyValidator: PayoutValidator,
    private val amountValidator: PayoutValidator,
    private val userTotalLimitValidator: PayoutValidator
) {
    // Individual validator dependencies don't scale
}
1
2
3
4
5
6
7
8

But when I pushed back with "This doesn't scale. Imagine we have 100 validators," Claude immediately offered better alternatives:

  • List of validators (most practical)
  • Composite validator pattern
  • Chain of responsibility

The Lesson: AI can suggest good architectural patterns when questioned, but you have to ask. It won't volunteer the best solution, it'll give you the working solution, just to cover this concrete case.

Action: Capture AI's architectural suggestions and add them to your conventions. Once documented, AI should follow them consistently.

2. Classic TDD is Impossible with AI (But VibeTDD Works)

The Logical Discovery: The traditional red-green-refactor cycle is fundamentally incompatible with AI collaboration.

Why Classic TDD Fails with AI:

  • Context Explosion: Each cycle adds conversation history
  • Memory Consumption: AI sessions blow up quickly
  • Time Overhead: Constant switching becomes prohibitively slow
  • Session Limits: You hit token limits before completing features

The VibeTDD Solution: Instead of one-test-at-a-time, write small, focused sets of related tests first, then implement them together.

❌ Classic: Write test → implement → write test → implement
✅ VibeTDD: Write focused test batch → verify they fail → implement together
1
2

Why This Works:

  • Reduces context switching overhead
  • Keeps AI sessions manageable
  • Maintains test-first discipline
  • Allows better validation of test completeness

The Impact: This single insight changed everything. VibeTDD batching makes AI collaboration practical for real projects.

3. Strict Prompts and Templates Are Essential

The Problem: AI interprets vague instructions creatively, leading to scope creep and over-engineering.

What Doesn't Work:

  • ❌ "Continue with the next step"
  • ❌ "Implement the validation"
  • ❌ "Proceed"

What Works:

  • ✅ "Write only the test cases for UserIdValidator. Don't implement anything yet."
  • ✅ "Implement only the UserIdValidator.validate() method"
  • ✅ "Show me the next single test case"

The Discovery: We need prompt templates for every common action:

  • Writing unit tests
  • Writing integration tests
  • Implementation tasks
  • Debugging issues
  • Code reviews
  • Architecture analysis

That means I have to discover how to efficiently prepare prompt templates.

4. Conventions Must Be Modular, Not Monolithic

The Problem: Our initial approach created one massive file covering everything:

  • Basic TDD conventions
  • Language-specific (Kotlin/Java/JavaScript)
  • Domain-specific (Backend/Frontend/Mobile)
  • Patterns library
  • Error handling standards
  • Framework-specific (Spring Boot/Vue/React)

Why This Failed:

  • Language confusion: AI mixed Kotlin patterns with JavaScript syntax
  • Context overload: Irrelevant sections confused AI
  • Maintenance nightmare: Updating one concept required changing multiple sections
  • Team overwhelm: 2,000+ lines were impossible to navigate

The Solution: Modular convention structure:

- Basic TDD Conventions          # Language-agnostic principles
- Kotlin Testing Conventions   # Language-specific practices  
- Backend Conventions          # Domain-specific patterns
- Patterns Library            # Reusable solutions
- Error Handling              # Error management strategies
- Framework Conventions       # How to use a specific framework
1
2
3
4
5
6

Benefits:

  • Teams compose exactly what they need
  • AI gets precise, relevant context
  • Updates don't cascade across unrelated areas
  • Clear ownership boundaries

5. Project Plans Are as Important as Conventions

The Insight: Even with human-led TDD, I got lost halfway through implementation.

Despite leading the process, I found myself unsure what to implement next. The test-by-test approach meant I was deep in details without a clear roadmap.

When I asked Claude "What are the next steps we need to cover?" it immediately provided:

  1. Implement storage layer with tests
  2. Create configuration interface
  3. Write integration tests with real components

The Learning: AI can serve as an excellent project navigator for well-defined tasks, but we need clear project plans alongside conventions before starting significant development.

What This Means:

  • Define scope and architecture upfront
  • Break work into clear phases
  • Maintain task lists and progress tracking
  • Use AI for navigation, not strategy

The Hard Truths: What Doesn't Work

1. AI-Led Development Doesn't Scale Past Toy Problems

Phase 1 (Calculator): Excellent TDD discipline, clean implementation Phase 2 (Payout Service): Required constant human intervention, architectural chaos

The Pattern: AI has a complexity threshold where its behavior changes fundamentally. Simple, well-understood problems work great. Real-world complexity breaks down quickly.

2. The Test-After Approach Is an Anti-Pattern

The Popular Myth: "Write code first, then ask AI to generate comprehensive tests."

The Reality: This creates:

  • Tests that lock in current implementation
  • False confidence from high coverage metrics
  • Unmaintainable code that's afraid to change
  • Tests that pass but don't prevent regressions

Phase 2.1 proved this conclusively: AI generates tests that look professional but are fundamentally flawed from an architecture perspective.

3. AI Doesn't Self-Optimize or Learn

Unlike human developers who learn from mistakes, AI repeats patterns without improvement. Every session starts fresh, and without explicit constraints, AI will:

  • Over-engineer simple solutions
  • Mix responsibilities in service classes
  • Create unnecessary abstractions
  • Generate verbose, defensive code

The Implication: Quality comes from human-designed constraints, not AI intelligence.

4. Integration Is the Hardest Part

Individual components work well with AI assistance, but connecting them coherently requires significant human oversight. AI struggles with:

  • System-level thinking
  • Component interaction patterns
  • Configuration flow between layers
  • End-to-end workflow design

The Scalability Question

Does any of this scale to real projects?

Based on three phases of experimentation:

What Scales:

  • VibeTDD batching for individual components
  • Human-led architecture with AI implementation assistance
  • Modular conventions that provide focused guidance
  • Prompt templates that reduce micro-management

⚠️ What Needs More Testing:

  • Complex multi-service applications
  • Legacy system refactoring
  • Performance-critical implementations
  • Real-time system development

What Doesn't Scale:

  • AI-led architectural decisions
  • Test-after development approaches
  • Monolithic convention systems
  • Unstructured AI collaboration

The Emerging Pattern: AI as Intelligent Assistant

The Insight: The most effective approach isn't "AI replaces developers" or "AI is just autocomplete." It's AI as intelligent assistant with clear boundaries.

Human Responsibilities:

  • Strategic thinking and architecture design
  • Test design and validation strategy
  • Integration oversight and system thinking
  • Quality control and pattern refinement
  • Project planning and scope management

AI Responsibilities:

  • Implementation within defined constraints
  • Pattern application across similar components
  • Mechanical refactoring and code generation
  • Project navigation for well-defined tasks
  • Comprehensive test case generation

Shared Artifacts:

  • Modular convention libraries
  • Prompt templates for common tasks
  • Architectural pattern collections
  • Quality assessment frameworks

What This Means for Your Team

If You're Just Starting with AI-Assisted Development:

  1. Start with VibeTDD batching - forget classic red-green-refactor
  2. Create focused conventions - don't try to document everything at once
  3. Use prompt templates - structure your AI interactions
  4. Maintain human control - AI assists, humans lead

If You're Already Using AI for Coding:

  1. Audit your approach - are you falling into the test-after trap?
  2. Check your conventions - are they modular and focused?
  3. Review your prompts - are they specific enough?
  4. Assess your architecture - is AI making design decisions?

If You're Leading a Development Team:

  1. Invest in convention development - this is infrastructure, not overhead
  2. Train on VibeTDD patterns - classic TDD training won't work
  3. Create prompt libraries - reduce the learning curve
  4. Establish quality gates - AI needs guardrails

The Meta-Learning

Perhaps the most important insight from three phases: VibeTDD isn't just about combining AI with TDD - it's about creating a new development methodology that acknowledges AI's strengths and constraints while maintaining software quality.

We're not just learning to code with AI. We're learning to think systematically about sustainable software development in an AI-augmented world.


These lessons form the foundation for Phase 4 and the upcoming VibeTDD Knowledge Base. The goal isn't to prove that AI can replace developers, but to discover how AI can make good developers more effective while maintaining the discipline and quality that sustainable software requires.

Next up: How I solved the monolithic convention problem and built a modular knowledge base that actually scales.

Built by software engineer for engineers )))