Skip to content
On this page

The VibeTDD Experiment: A Roadmap for Combining AI and Test-Driven Development

After documenting the reality of AI-led development the successes, failures, and hidden costs, I keep coming back to one question: What if we could harness AI's speed while maintaining the discipline that creates sustainable software?

I believe Test-Driven Development holds the key. But rather than theorize about it, I'm going to experiment systematically. Here's my roadmap for discovering how AI and TDD can work together effectively.

The Core Hypothesis

TDD can serve as quality guardrails for AI-generated code. When you write tests first, you:

  • Define the behavior before implementation
  • Catch regressions immediately
  • Force good design through small, focused methods
  • Create living documentation of what the code should do

When AI writes code to pass your tests, it can't take shortcuts or add unnecessary complexity. The tests become a contract that keeps the AI honest.

But does this actually work in practice? Let's find out.

The Experimental Journey

Phase 1: Can AI Teach TDD?

The Test: Let AI lead everything on a simple calculator project. I won't provide TDD guidance, I want to see if AI can teach the basics and follow proper TDD cycles on its own.

What I'm Looking For:

  • Does AI understand the red-green-refactor cycle?
  • Can it write failing tests before implementation?
  • Will it suggest reasonable test cases?
  • Does it naturally create small, focused methods?

My Prediction: AI will probably start well but drift from TDD discipline as complexity grows. It might write tests and implementation simultaneously, or suggest overly complex test scenarios.

Phase 2: AI-Led TDD on Real Requirements

The Test: Give AI a real coding challenge - the payout service from Portfo. Let it lead the TDD process completely while I observe.

What I'm Looking For:

  • How does AI handle validation rules and edge cases?
  • Can it maintain test-first discipline with business requirements?
  • Will it create appropriate test structure and organization?
  • Does it understand the difference between unit and integration tests?

My Prediction: This is where AI will likely struggle. Real requirements involve trade-offs, business context, and architectural decisions that require human judgment.

Phase 3: Human-Led TDD with AI Assistance

The Test: Take the same Portfo challenge, but this time I lead the TDD process while using AI as an implementation assistant.

What I'm Looking For:

  • Can AI generate code that passes my human-written tests?
  • Does it respect the boundaries I set through test design?
  • How well does it handle refactoring while keeping tests green?
  • Can I maintain control over architecture while leveraging AI speed?

My Prediction: This should work much better. With human-designed tests as guardrails, AI should be able to provide fast, correct implementations without the architectural drift I've seen before.

Phase 4: Scaling Complexity

The Test: Apply the best approach from previous phases to increasingly complex scenarios, it can be more complex task from Porto to build a notifications service that assumes integration with other services.

What I'm Looking For:

  • Do the patterns hold as complexity increases?
  • How does AI handle testing across different layers (unit, integration, contract)?
  • Can the approach maintain code quality over time?
  • Where are the breaking points?

Phase 5: Legacy System Refactoring

The Test: Find a simple but messy legacy project and try to rewrite it using the emerging AI+TDD strategy.

What I'm Looking For:

  • Can AI help modernize old code while maintaining behavior through tests?
  • How effective is this approach for brownfield projects?
  • Does TDD help prevent regressions during AI-assisted refactoring?
  • What's the effort comparison vs. traditional refactoring?

Phase 6: Real-World Application

The Test: Take everything I've learned and apply it to Mirameo - refactor parts of the codebase using the documented conventions and strategies discovered in previous phases.

What I'm Looking For:

  • Does the approach work on production code with real users?
  • How does it handle existing technical debt?
  • Can the framework scale to a full development workflow?
  • What documentation and conventions are actually necessary?

What Success Looks Like

By the end of this experimental journey, I expect to have:

A Clear Framework: Documented guidelines for when and how to use AI with TDD effectively.

Proven Conventions: Code standards and practices that both humans and AI can follow consistently.

Realistic Boundaries: Clear understanding of where AI helps, where it hurts, and where human judgment is irreplaceable.

Practical Tools: Templates, examples, and workflows that other developers can use.

Why This Matters

We're at a pivotal moment in software development. AI tools are becoming powerful enough to generate substantial amounts of code, but most developers are using them in ad-hoc ways that create long-term maintenance problems.

By systematically exploring how TDD can provide structure for AI collaboration, we might discover a sustainable approach that combines:

  • AI's speed and knowledge
  • Human architectural wisdom
  • TDD's quality discipline

Following Along

I'll document each experiment in detail - the successes, failures, and surprising discoveries. Some posts will be technical deep-dives showing actual code and test progression. Others will be broader reflections on what works and what doesn't.

The goal isn't to prove that AI+TDD is perfect, but to find the realistic boundaries and best practices for this combination. If you're interested in sustainable AI-assisted development, I'd love to have you follow along with the experiments.


What aspects of AI+TDD collaboration are you most curious about? Are there specific scenarios you'd like me to test as part of this experimental journey?

Built by software engineer for engineers )))