VibeTDD Lessons After 3 Phases: What Actually Works with AI
Some time ago, I started the VibeTDD experiment with a simple question: Can AI and Test-Driven Development work together effectively?
After testing three different approaches: AI-led development, human-led TDD, and the popular "test-after" pattern - I've discovered some surprising truths about the real boundaries and opportunities of AI-assisted coding.
Here's what I learned about what actually works, what definitely doesn't, and why the future of AI-assisted development looks very different from what most people expect.
The Three-Phase Journey
Phase 1: Can AI Teach TDD?
The Test: Let Claude lead a simple calculator project using TDD principles. The Result: Surprisingly successful for simple problems.
Phase 2: When AI Leads Complex TDD
The Test: Let Claude handle a real-world payout service with business rules and validation. The Result: Architectural disaster requiring constant human intervention.
Phase 2.1: The Test-After Trap
The Test: The popular "generate tests for existing code" approach. The Result: False confidence leading to unmaintainable code.
Phase 3: Human-Led TDD with AI Assistance
The Test: I control the TDD process, AI helps with implementation. The Result: Promising but requiring significant management overhead.
The Big Lessons: What Actually Works
1. AI Can Suggest Architecture But Won't Implement It by Default
The Discovery: AI knows good patterns but defaults to expedient solutions.
In Phase 2, when Claude first suggested service architecture, it proposed:
class PayoutService(
private val userIdValidator: PayoutValidator,
private val currencyValidator: PayoutValidator,
private val amountValidator: PayoutValidator,
private val userTotalLimitValidator: PayoutValidator
) {
// Individual validator dependencies don't scale
}
2
3
4
5
6
7
8
But when I pushed back with "This doesn't scale. Imagine we have 100 validators," Claude immediately offered better alternatives:
- List of validators (most practical)
- Composite validator pattern
- Chain of responsibility
The Lesson: AI can suggest good architectural patterns when questioned, but you have to ask. It won't volunteer the best solution, it'll give you the working solution, just to cover this concrete case.
Action: Capture AI's architectural suggestions and add them to your conventions. Once documented, AI should follow them consistently.
2. Classic TDD is Impossible with AI (But VibeTDD Works)
The Logical Discovery: The traditional red-green-refactor cycle is fundamentally incompatible with AI collaboration.
Why Classic TDD Fails with AI:
- Context Explosion: Each cycle adds conversation history
- Memory Consumption: AI sessions blow up quickly
- Time Overhead: Constant switching becomes prohibitively slow
- Session Limits: You hit token limits before completing features
The VibeTDD Solution: Instead of one-test-at-a-time, write small, focused sets of related tests first, then implement them together.
❌ Classic: Write test → implement → write test → implement
✅ VibeTDD: Write focused test batch → verify they fail → implement together
2
Why This Works:
- Reduces context switching overhead
- Keeps AI sessions manageable
- Maintains test-first discipline
- Allows better validation of test completeness
The Impact: This single insight changed everything. VibeTDD batching makes AI collaboration practical for real projects.
3. Strict Prompts and Templates Are Essential
The Problem: AI interprets vague instructions creatively, leading to scope creep and over-engineering.
What Doesn't Work:
- ❌ "Continue with the next step"
- ❌ "Implement the validation"
- ❌ "Proceed"
What Works:
- ✅ "Write only the test cases for UserIdValidator. Don't implement anything yet."
- ✅ "Implement only the UserIdValidator.validate() method"
- ✅ "Show me the next single test case"
The Discovery: We need prompt templates for every common action:
- Writing unit tests
- Writing integration tests
- Implementation tasks
- Debugging issues
- Code reviews
- Architecture analysis
That means I have to discover how to efficiently prepare prompt templates.
4. Conventions Must Be Modular, Not Monolithic
The Problem: Our initial approach created one massive file covering everything:
- Basic TDD conventions
- Language-specific (Kotlin/Java/JavaScript)
- Domain-specific (Backend/Frontend/Mobile)
- Patterns library
- Error handling standards
- Framework-specific (Spring Boot/Vue/React)
Why This Failed:
- Language confusion: AI mixed Kotlin patterns with JavaScript syntax
- Context overload: Irrelevant sections confused AI
- Maintenance nightmare: Updating one concept required changing multiple sections
- Team overwhelm: 2,000+ lines were impossible to navigate
The Solution: Modular convention structure:
- Basic TDD Conventions # Language-agnostic principles
- Kotlin Testing Conventions # Language-specific practices
- Backend Conventions # Domain-specific patterns
- Patterns Library # Reusable solutions
- Error Handling # Error management strategies
- Framework Conventions # How to use a specific framework
2
3
4
5
6
Benefits:
- Teams compose exactly what they need
- AI gets precise, relevant context
- Updates don't cascade across unrelated areas
- Clear ownership boundaries
5. Project Plans Are as Important as Conventions
The Insight: Even with human-led TDD, I got lost halfway through implementation.
Despite leading the process, I found myself unsure what to implement next. The test-by-test approach meant I was deep in details without a clear roadmap.
When I asked Claude "What are the next steps we need to cover?" it immediately provided:
- Implement storage layer with tests
- Create configuration interface
- Write integration tests with real components
The Learning: AI can serve as an excellent project navigator for well-defined tasks, but we need clear project plans alongside conventions before starting significant development.
What This Means:
- Define scope and architecture upfront
- Break work into clear phases
- Maintain task lists and progress tracking
- Use AI for navigation, not strategy
The Hard Truths: What Doesn't Work
1. AI-Led Development Doesn't Scale Past Toy Problems
Phase 1 (Calculator): Excellent TDD discipline, clean implementation Phase 2 (Payout Service): Required constant human intervention, architectural chaos
The Pattern: AI has a complexity threshold where its behavior changes fundamentally. Simple, well-understood problems work great. Real-world complexity breaks down quickly.
2. The Test-After Approach Is an Anti-Pattern
The Popular Myth: "Write code first, then ask AI to generate comprehensive tests."
The Reality: This creates:
- Tests that lock in current implementation
- False confidence from high coverage metrics
- Unmaintainable code that's afraid to change
- Tests that pass but don't prevent regressions
Phase 2.1 proved this conclusively: AI generates tests that look professional but are fundamentally flawed from an architecture perspective.
3. AI Doesn't Self-Optimize or Learn
Unlike human developers who learn from mistakes, AI repeats patterns without improvement. Every session starts fresh, and without explicit constraints, AI will:
- Over-engineer simple solutions
- Mix responsibilities in service classes
- Create unnecessary abstractions
- Generate verbose, defensive code
The Implication: Quality comes from human-designed constraints, not AI intelligence.
4. Integration Is the Hardest Part
Individual components work well with AI assistance, but connecting them coherently requires significant human oversight. AI struggles with:
- System-level thinking
- Component interaction patterns
- Configuration flow between layers
- End-to-end workflow design
The Scalability Question
Does any of this scale to real projects?
Based on three phases of experimentation:
✅ What Scales:
- VibeTDD batching for individual components
- Human-led architecture with AI implementation assistance
- Modular conventions that provide focused guidance
- Prompt templates that reduce micro-management
⚠️ What Needs More Testing:
- Complex multi-service applications
- Legacy system refactoring
- Performance-critical implementations
- Real-time system development
❌ What Doesn't Scale:
- AI-led architectural decisions
- Test-after development approaches
- Monolithic convention systems
- Unstructured AI collaboration
The Emerging Pattern: AI as Intelligent Assistant
The Insight: The most effective approach isn't "AI replaces developers" or "AI is just autocomplete." It's AI as intelligent assistant with clear boundaries.
Human Responsibilities:
- Strategic thinking and architecture design
- Test design and validation strategy
- Integration oversight and system thinking
- Quality control and pattern refinement
- Project planning and scope management
AI Responsibilities:
- Implementation within defined constraints
- Pattern application across similar components
- Mechanical refactoring and code generation
- Project navigation for well-defined tasks
- Comprehensive test case generation
Shared Artifacts:
- Modular convention libraries
- Prompt templates for common tasks
- Architectural pattern collections
- Quality assessment frameworks
What This Means for Your Team
If You're Just Starting with AI-Assisted Development:
- Start with VibeTDD batching - forget classic red-green-refactor
- Create focused conventions - don't try to document everything at once
- Use prompt templates - structure your AI interactions
- Maintain human control - AI assists, humans lead
If You're Already Using AI for Coding:
- Audit your approach - are you falling into the test-after trap?
- Check your conventions - are they modular and focused?
- Review your prompts - are they specific enough?
- Assess your architecture - is AI making design decisions?
If You're Leading a Development Team:
- Invest in convention development - this is infrastructure, not overhead
- Train on VibeTDD patterns - classic TDD training won't work
- Create prompt libraries - reduce the learning curve
- Establish quality gates - AI needs guardrails
The Meta-Learning
Perhaps the most important insight from three phases: VibeTDD isn't just about combining AI with TDD - it's about creating a new development methodology that acknowledges AI's strengths and constraints while maintaining software quality.
We're not just learning to code with AI. We're learning to think systematically about sustainable software development in an AI-augmented world.
These lessons form the foundation for Phase 4 and the upcoming VibeTDD Knowledge Base. The goal isn't to prove that AI can replace developers, but to discover how AI can make good developers more effective while maintaining the discipline and quality that sustainable software requires.
Next up: How I solved the monolithic convention problem and built a modular knowledge base that actually scales.