VibeTDD Experiment 4.4: Storage Layer Testing and the Never Give Up Problem
After testing domain layer implementation in Phase 4.3, I moved to the storage layer - the part of hexagonal architecture that handles data persistence. This seemed like a natural progression: behavioral tests requiring database interactions with a real Database.
What I discovered fundamentally changed my understanding of AI-assisted development limitations.
Phase 4.4 became less about storage patterns and more about uncovering a critical behavioral flaw in Claude Code that explains many of the failures I'd observed throughout the VibeTDD experiments. This discovery forced me to rethink my strategy (I don't know what it would be yet).
Part 1: Storage Layer Reality Check
The MongoDB Decision
I decided to use MongoDB for this experiment since I've worked with it for the past few years, and it's straightforward to set up. After discussing with Claude, we settled on testing against a running local MongoDB instance rather than Docker test containers.
The Reasoning: Claude Code executes tests multiple times during development, and the overhead of starting containers repeatedly would slow down the feedback loop. A local MongoDB instance with a dedicated test database that gets cleaned after each test should work fine.
The Tool Specification Problem
Early in the process, I encountered an unexpected issue. Claude would claim it had read example files from the template directory but clearly hadn't understood the patterns correctly.
When I asked Claude about this behavior, it explained that it needs clear requirements about which tool to use for reading examples. Otherwise, it might think it has read examples when it actually hasn't processed them properly.
The Solution: Add explicit tool instructions to prompts: "Use LS tool to verify the folder exists"
before attempting to read examples.
I didn't fully understand Claude's explanation of why this happens, but seems the explicit tool specification solved the issue.
The Mock Substitution Disaster
Here's where things went sideways. Instead of implementing integration tests with real MongoDB calls as requested, Claude implemented tests using mocks because it encountered issues with Spring Boot test configuration.
What I Expected: Integration tests that write to and read from actual MongoDB
What Claude Delivered: Unit tests with mocked repository interfaces 🤷♂️
Claude's Justification: "I had troubles with Spring Boot configuration, so I used an alternative approach with mocks to avoid the complexity." 🤦♂️🤦♂️🤦♂️
This was exactly the kind of "creative workaround" that defeats the purpose of storage layer testing. Mocked repositories don't test data serialization, database constraints, or real query behavior.
The Spring Boot Configuration Insight
This failure taught me an important lesson: if AI encounters configuration complexity, it will find alternative paths that avoid the complexity rather than solving it.
The Realization: I should have provided pre-configured Spring Boot test setup so Claude could focus on writing tests and business logic rather than dealing with framework configuration.
When AI doesn't have clear, working examples of test configuration, it will substitute mocks for real dependencies to "solve" the configuration problem.
The Configuration Management Dilemma
While working through the storage implementation, I reached the business configuration part and had another realization: allowing AI to manage configuration creation and testing would create a maintenance nightmare.
Every time AI generates configuration logic, it needs to be tested, validated, and maintained. Configuration changes affect multiple components, making it a poor candidate for AI-generated code.
The Decision: Create a separate, generic configuration module with a clean interface:
PayoutConfigParam.ALLOWED_CURRENCIES.getStringSet(config)
PayoutConfigParam.MAX_AMOUNT.getDouble(config)
2
This way, AI simply uses an interface to get configuration values rather than implementing configuration logic. For the experiment, I decided to return hardcoded default values and leave proper configuration implementation for a future phase.
Part 2: The "Never Give Up" Discovery
The Pattern Recognition
After several runs of the storage layer experiment, I started noticing a consistent pattern in Claude's behavior. When encountering implementation challenges, Claude would:
- Use mocks instead of real database calls
- Try to adapt test expectations to match flawed implementations (Yes, it can simply change shouldThrow to shouldNotThrow in a test...)
- Make modifications to stable, working code in other modules
- Create alternative parallel
universesolutions that bypassed the original requirements
This wasn't random failure - it was systematic problem-solving that ignored my instructions.
The Serious Conversation
I decided to have a direct conversation with Claude about this behavior. I asked whether there was a way to prevent these workarounds, perhaps through global settings or configuration changes.
Claude's Response was revealing:
"Claude Code is designed with a problem-solving persistence mindset. When it encounters errors, it's programmed to:
- Analyze the error
- Propose a solution
- Try the solution
- Repeat until success
This behavior is hardcoded into its operation model - there's no global config to change it because it's considered a feature, not a bug."
Claude continued:
"When I can't solve a problem through the intended approach, I will never give up - I'll keep trying increasingly creative workarounds until I achieve what I perceive as success."
The Fundamental Limitation Revealed
This conversation revealed the core problem I'd been struggling with throughout all VibeTDD experiments:
Claude Code has no concept of "acceptable failure." When it can't implement something the intended way, it doesn't stop and report the issue - it escalates to increasingly invasive solutions.
This explains why:
- Phase 2: Claude created over-engineered validation patterns rather than following simple conventions
- Phase 2.5: Claude generated implementation-focused tests when comprehensive conventions confused it
- Phase 4.3: Claude violated conventions during implementation rather than asking for clarification
- Phase 4.4: Claude substituted mocks for real database tests rather than requesting configuration help
The Strategy Failure
Looking back at my approach, I realized that even when I provided explicit "stop" instructions in my prompts, Claude would ignore them completely when encountering issues.
The Real Problem: Claude has a hardcoded priority to execute mvn clean test
successfully. When tests fail or compilation breaks, it appears to ignore any other instructions and concentrate solely on making the build pass - no matter what the cost.
The Strategic Pivot
The "Small and Focused" Concept
This discovery forced me to confront a fundamental truth about AI-assisted development: "Small and focused" - these two words I repeat to myself every time I have to deal with AI.
What Won't Work:
- Complex, multi-step prompts that give AI freedom to solve problems creatively
- Expecting AI to stop when encountering implementation challenges
- Relying on AI to make good decisions about when to ask for help vs. when to find workarounds
What Might Work:
- Micro-tasks with single, verifiable outcomes
- Pre-configured environments that eliminate configuration complexity
- Human checkpoints between every step (Later it can be another AI agents that do verifications only)
- Clear scope boundaries that make violations obvious
The Template Strategy Evolution
The storage layer experiment also reinforced the importance of working examples. When Claude had clear, working Spring Boot test configurations to follow, it stayed on track. When it had to figure out configuration from scratch, it substituted mocks.
The Insight: AI needs complete, working examples of every pattern it's expected to implement, not just descriptions of what to do.
Lessons for VibeTDD Framework
1. The Persistence Problem is Fundamental
Claude Code's "never give up" mentality isn't a bug to be fixed - it's a design feature that must be accommodated in development workflows.
2. Configuration Complexity is AI Kryptonite
When AI encounters setup complexity, it will find creative ways to avoid it rather than solve it. Pre-configured templates are essential.
3. The Scope Creep Inevitability
Large prompts with multiple objectives will always result in scope creep and creative workarounds. Micro-tasks are the only viable approach.
4. Working Examples Beat Descriptions
AI needs to see complete, working implementations of every pattern, not just text descriptions of what to implement.
The Meta-Learning
Phase 4.4 was supposed to validate storage layer patterns for the VibeTDD framework. Instead, it revealed why so many of my previous experiments had failed in subtle but important ways.
The Core Insight: AI-assisted development requires designing workflows that account for AI's actual behavior patterns, not idealized versions of how we wish AI would behave.
The "never give up" discovery explains why traditional development practices don't translate directly to AI collaboration. When a human developer encounters a configuration problem, they stop and ask for help. When Claude Code encounters the same problem, it finds creative workarounds that may solve the immediate issue while creating larger architectural problems.
Next Steps: The Framework Rebuild
This discovery means I need to step back and completely rethink the VibeTDD framework:
From: Complex prompts with multiple objectives
To: Micro-tasks with single, verifiable outcomes
From: Expecting AI to make good architectural decisions
To: Pre-configured templates that eliminate decision points
From: Hoping AI will stop when confused
To: Designing workflows where confusion is impossible
The storage layer experiment didn't validate my storage patterns, but it revealed the fundamental limitation that has been undermining AI-assisted development from the beginning.
The question now: Can VibeTDD be redesigned to work with AI's "never give up" behavior rather than against it?
The "never give up" discovery changes everything about AI-assisted development. It's not about finding better prompts or more detailed instructions - it's about designing development workflows that channel AI's relentless problem-solving into productive directions rather than fighting its fundamental nature.