VibeTDD Experiment 4.2: From Specs to Stories to Tasks - AI as Business Analyst
After Phase 4.1 showed that AI shouldn't handle mechanical project setup, I moved to something more promising: Can AI help translate business requirements into actionable development work?
This experiment tested whether AI could serve as an intelligent business analyst, breaking down technical specifications into user stories and then into specific development tasks with test criteria.
The results revealed important patterns about AI collaboration and led to a breakthrough in prompt organization that I've been applying across all VibeTDD work.
The Workflow: Spec → Stories → Tasks
The Goal
Transform a simple requirements document into structured development work:
- Technical Specification → User-focused stories
- User Stories → Layer-specific development tasks
- Development Tasks → Objectives + test scenarios in plain English
The Test Subject: Payout Service
I used the same payout domain from previous experiments, but this time with complete API requirements:
Original Spec (simplified):
## Payout Object
* UserId - UUID
* Amount - e.g. 20.00, 14.56
* Currency - ISO Code
## Validation Rules
* UserId - must not be null and be valid UUID
* Amount - must be positive and not exceed allowed limit
* Currency - must be valid ISO code and one of allowed currencies
* Sum of user payouts must not exceed allowed limit
## API Endpoints
* Create new payout
* Retrieve payout by ID
* Retrieve all payouts for user
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Step 1: Spec to Stories
AI generated 3 complete user stories from the specification:
- Create New Payout (POST endpoint)
- Retrieve Specific Payout by ID (GET endpoint)
- Retrieve All Payouts for User (GET with pagination)
Example - Story 1 Output:
### Story 1: Create New Payout
**As a** user
**I want to** create a new payout request via API
**So that** I can request funds to be transferred to my account
**API Contract:**
- **Endpoint**: POST /v1/payouts
- **Request**:
{
"userId": "123e4567-e89b-12d3-a456-426614174000",
"amount": 25.50,
"currency": "USD"
}
**Acceptance Criteria:**
- Given valid userId, amount, and currency, when creating payout, then returns 201 with payout details
- Given multiple valid requests under total limit, when creating payouts, then all are accepted
**Validation & Error Scenarios:**
- Given null userId → returns 422 with NotNull validation error
- Given invalid UUID format → returns 422 with Pattern validation error
- Given amount exceeding individual limit → returns 422 with DecimalMax validation error
- Given total user payouts would exceed limit → returns 422 with business rule violation
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
What Worked: AI created user-focused stories with proper API contracts and comprehensive error scenarios. I don't agree that it defined DecimalMax
for 'exceeding individual limit, but it can be tuned later.
Step 2: Stories to Development Tasks
For each story, AI generated tasks separated by hexagonal architecture layers, including both objectives (what the layer accomplishes) and test scenarios (behavioral requirements):
Domain Layer Task:
## Task 1: Domain Layer - Payout Business Rules
### Objective
Design and test business logic for payout creation, focusing on business
constraints, validation rules, and user limits.
### Test Cases to Design
**Amount Business Validation Batch**:
- should accept payout when amount is positive and within limits
- should reject payout when amount is not positive
- should reject payout when amount exceeds individual payout limit
- should reject payout when amount would cause user total to exceed limit
**Currency Business Rules Batch**:
- should accept payout when currency is supported
- should reject payout when currency is not supported
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
API Layer Task:
## Task 3: API Layer - Payout HTTP Interface
### Test Cases to Design
**Request Format Validation Batch**:
- should reject request when userId is not UUID format
- should reject request when amount is not a number
- should reject request when currency is not 3-letter code format
**Success Response Batch**:
- should return 201 with payout details when request is valid
- should return payout with generated ID and timestamps
2
3
4
5
6
7
8
9
10
11
12
Storage Layer Task:
## Task 2: Storage Layer - Payout Data Persistence
### Test Cases to Design
**Payout Storage Batch**:
- should store payout with generated UUID and all required data
- should store payout with correct timestamps
**Payout Retrieval Batch**:
- should find payouts by userId
- should calculate total payout amount for user from stored payouts
2
3
4
5
6
7
8
9
10
11
Key Discoveries
1. Specs Must Focus on Behavior, Not Limits
The Problem: When I included specific limits in requirements:
❌ "Amount must not exceed 30"
❌ "Allowed currencies: EUR, USD, GBP"
2
AI propagated these concrete values throughout stories, tasks, and tests, creating implementation-focused scenarios rather than behavioral ones.
The Solution: Use behavioral descriptions:
✅ "Amount must not exceed allowed limit"
✅ "Currency must be one of allowed currencies"
2
Why This Matters: Behavioral specs lead to configurable implementations. Specific limits lead to hardcoded solutions.
2. Validation Rules Need Precision
Vague Requirements Fail:
❌ "All fields are required"
Specific Requirements Work:
✅ "UserId - must not be empty and have valid UUID format"
✅ "Amount - must be positive and not exceed allowed limit"
✅ "Currency - must be valid ISO code and one of allowed currencies"
2
3
The Insight: AI needs explicit guidance about validation types to generate proper test scenarios.
The Prompt + Guide Pattern Discovery
The Story-to-Tasks Context Problem
When converting stories to development tasks, I initially provided comprehensive convention files (validation patterns, testing strategies). The results included non-behavioral tests like:
❌ "should use configuration to determine supported currencies"
This happened because AI read configuration conventions and treated them as things to test, not implementation details.
The Breakthrough: I discovered this was a context organization problem specific to the story-to-tasks conversion step.
The Solution: Minimal Prompt + Focused Guide
For the story-to-tasks conversion, I created:
Streamlined Prompt:
Convert the story to development tasks following the attached guide.
Structure each task with:
- Objective (what this layer accomplishes)
- Test Cases to Design (behavioral scenarios in plain English)
Focus on WHAT to test, not HOW to implement.
Follow the attached testing guide for conventions.
2
3
4
5
6
7
8
Single Testing Guide:
- VibeTDD batching principles
- Layer responsibilities in hexagonal architecture
- Test naming conventions ("should [action] when [condition]")
- Behavioral vs implementation testing
- Anti-patterns to avoid
Results After Reorganization
- Clear behavioral focus: Tests like "should reject payout when amount exceeds limit"
- Proper layer separation: API tests format validation, Domain tests business rules
- Consistent naming: All test scenarios follow behavioral patterns
- No implementation leakage: Configuration usage doesn't become test scenarios
Note: For the spec-to-stories conversion, I kept the original approach with multiple API convention files since specific examples of request/response structure are needed for proper story generation.
Lessons for VibeTDD
1. AI as Business Analyst Works
AI excels at breaking down requirements into structured work when given proper context about:
- User story format and language
- API design patterns
- Testing layer responsibilities
- Behavioral vs implementation focus
2. Context Organization Trumps Context Quantity
The key insight: AI collaboration quality depends more on how you organize context than how much context you provide.
3. The Prompt + Guide Pattern is Task-Specific
This pattern worked well for story-to-tasks conversion and can potentially be applied to other AI tasks:
- Code review: Minimal prompt + comprehensive review guide
- Architecture documentation: Simple instruction + detailed documentation patterns
- Requirements analysis: Basic task + thorough analysis framework
However, each task type may need different context organization strategies - spec-to-stories still benefits from multiple specific API convention examples.
Next up: Testing whether AI can take these behavioral test scenarios and generate actual test and then working code. If the spec-to-stories-to-tasks workflow creates proper behavioral requirements, can AI implement them without falling into the architectural traps I've seen before?