VibeTDD Experiment 1: Teaching a Calculator with Test-Driven Development
This is the first experiment in my VibeTDD series, where I systematically explore how AI and Test-Driven Development can work together effectively.
The Setup
I decided to start with the classic TDD exercise: building a calculator. My approach was simple - let Claude lead the entire process while I observed how well AI could teach and follow TDD principles.
The Rules I Set:
- Claude would guide the exercise and make all technical decisions
- I would only copy-paste the code it produced
- When Claude asked what to do next, I'd tell it to decide
- No TDD guidance from me - I wanted to see AI's natural approach
The Tech Stack:
- Kotlin (my favorite language)
- Maven for build management
- JUnit 5 for testing framework
- Kotest for assertions (more interesting than standard JUnit)
Phase 1: Project Setup
Claude started correctly by establishing the foundation:
<!-- pom.xml excerpt -->
<dependencies>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-stdlib</artifactId>
<version>1.9.20</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>5.10.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.kotest</groupId>
<artifactId>kotest-assertions-core-jvm</artifactId>
<version>5.7.2</version>
<scope>test</scope>
</dependency>
</dependencies>
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
The project structure was clean and followed conventions:
src/
├── main/kotlin/com/example/calculator/Calculator.kt
└── test/kotlin/com/example/calculator/CalculatorTest.kt
2
3
First Impression: Claude understood modern project setup and chose sensible dependencies.
Phase 2: The First Red-Green-Refactor Cycle
RED: The First Failing Test
Claude started with the simplest possible test:
@Test
fun `should add two positive numbers`() {
val result = calculator.add(2, 3)
result shouldBe 5
}
2
3
4
5
With an empty Calculator class:
class Calculator {
}
2
Observation: Perfect TDD start - write a test that fails for the right reason.
GREEN: Making It Pass (The Hard Way)
Here's where it got interesting. Claude implemented this:
class Calculator {
fun add(a: Int, b: Int): Int {
return 5 // Hardcoded!
}
}
2
3
4
5
This was actually correct TDD! Many developers would jump straight to return a + b
, but Claude followed the discipline of writing the minimal code to make the test pass.
The Second Test Forces Generalization
Claude then added:
@Test
fun `should add different positive numbers`() {
val result = calculator.add(4, 7)
result shouldBe 11
}
2
3
4
5
Now the hardcoded return 5
couldn't satisfy both tests, forcing the proper implementation:
fun add(a: Int, b: Int): Int {
return a + b
}
2
3
Key Learning: AI understood the triangulation principle - you need multiple test cases to drive toward general solutions.
Phase 3: Building Up Operations
Claude continued the pattern for each operation:
Subtraction
// Test first
@Test
fun `should subtract two numbers`() {
val result = calculator.subtract(7, 3)
result shouldBe 4
}
// Then implementation
fun subtract(a: Int, b: Int): Int {
return a - b
}
2
3
4
5
6
7
8
9
10
11
Multiplication with Edge Cases
@Test
fun `should multiply two positive numbers`() {
val result = calculator.multiply(3, 4)
result shouldBe 12
}
@Test
fun `should multiply by zero`() {
val result = calculator.multiply(5, 0)
result shouldBe 0
}
2
3
4
5
6
7
8
9
10
11
Division with Error Handling
This is where Claude showed sophisticated understanding:
@Test
fun `should divide two numbers`() {
val result = calculator.divide(12, 3)
result shouldBe 4
}
@Test
fun `should throw exception when dividing by zero`() {
val exception = org.junit.jupiter.api.assertThrows<ArithmeticException> {
calculator.divide(10, 0)
}
exception.message shouldBe "Division by zero is not allowed"
}
2
3
4
5
6
7
8
9
10
11
12
13
And the implementation:
fun divide(a: Int, b: Int): Int {
if (b == 0) {
throw ArithmeticException("Division by zero is not allowed")
}
return a / b
}
2
3
4
5
6
Impressive: Claude naturally progressed to testing exception scenarios and error messages.
Phase 4: Test Refactoring Lesson
After building the basic functionality, I noticed redundant tests and mentioned it. Claude immediately identified the issue:
// These two tests were redundant:
should add different positive numbers // (4, 7) -> 11
should add number with zero // (5, 0) -> 5
2
3
But here's where I learned something important. When we removed the "redundant" test, Claude pointed out a crucial flaw:
"Now I can do this and all tests are green:
kotlinfun add(a: Int, b: Int): Int { return 5 }
1
2
3
The Lesson: What seemed redundant was actually providing necessary triangulation. One test allows hardcoding; multiple tests force generalization.
Phase 5: Professional Test Structure
Claude then suggested refactoring to parameterized tests:
@ParameterizedTest
@CsvSource(
"2, 3, 5",
"-2, -3, -5",
"5, 0, 5",
"0, 0, 0"
)
fun `should add numbers correctly`(a: Int, b: Int, expected: Int) {
val result = calculator.add(a, b)
result shouldBe expected
}
2
3
4
5
6
7
8
9
10
11
This elegantly solved the triangulation problem while reducing test maintenance burden.
Final Test Structure:
- 3 parameterized tests for normal operations
- 2 individual tests for exception cases
- Comprehensive coverage with minimal redundancy
What I Discovered
✅ AI Understands TDD Fundamentals
- Wrote tests before implementation consistently
- Followed red-green-refactor cycles
- Used triangulation to drive general solutions
- Recognized when to test exceptions vs. normal cases
✅ AI Taught Good Practices
- Suggested parameterized tests for better maintainability
- Explained the reasoning behind each TDD step
- Identified test redundancy and optimization opportunities
- Showed proper exception testing patterns
- Defined meaningful method names
⚠️ Areas Needing Human Oversight
- Auto-progression: Claude started making decisions without asking
- Context switching: Sometimes lost track of which phase we were in
- Optimization timing: Needed guidance on when to refactor vs. add features
❌ Potential Pitfalls
- Could have over-engineered early if not constrained by simple tests
- Might not naturally consider all edge cases without prompting
- Test refactoring decisions needed human judgment
The Verdict
VibeTDD works surprisingly well for simple, well-defined problems. Claude demonstrated solid understanding of TDD principles and could teach them effectively. The test-first approach kept the AI focused and prevented over-engineering.
However, this was just a calculator - the next challenge will be more telling.
Key Takeaways for VibeTDD
- AI can teach TDD basics effectively but needs human oversight for decisions
- Tests provide excellent guardrails for AI-generated code
- Triangulation is crucial don't remove "redundant" tests too quickly
- Parameterized tests are a game-changer for maintainable test suites
- Red-green-refactor discipline keeps AI from over-engineering
Next: The Real Challenge
The calculator experiment was encouraging, but it's a toy problem. Next, I'm taking on the Portfo payout service challenge - a real-world problem with business rules, validation logic, and architectural decisions.
Will AI maintain TDD discipline when faced with:
- Complex business requirements?
- Multiple validation rules?
- Integration concerns?
- Architectural trade-offs?
The calculator taught us the basics work. Now let's see if VibeTDD scales to realistic complexity.
Want to follow along with the VibeTDD experiments? I'll be documenting each phase as I explore the boundaries of AI-assisted Test-Driven Development following the roadmap.
Code Repository
The complete code from this experiment is available at: VibeTDD Phase 1 Repository