VibeTDD Experiment 4.2: From Specs to Stories to Tasks - AI as Business Analyst

After Phase 4.1 showed that AI shouldn't handle mechanical project setup, I moved to something more promising: Can AI help translate business requirements into actionable development work?

This experiment tested whether AI could serve as an intelligent business analyst, breaking down technical specifications into user stories and then into specific development tasks with test criteria.

The results revealed important patterns about AI collaboration and led to a breakthrough in prompt organization that I've been applying across all VibeTDD work.

The Workflow: Spec → Stories → Tasks

The Goal

Transform a simple requirements document into structured development work:

Technical Specification → User-focused stories
User Stories → Layer-specific development tasks
Development Tasks → Objectives + test scenarios in plain English

The Test Subject: Payout Service

I used the same payout domain from previous experiments, but this time with complete API requirements:

Original Spec (simplified):

## Payout Object
* UserId - UUID
* Amount - e.g. 20.00, 14.56  
* Currency - ISO Code

## Validation Rules
* UserId - must not be null and be valid UUID
* Amount - must be positive and not exceed allowed limit
* Currency - must be valid ISO code and one of allowed currencies
* Sum of user payouts must not exceed allowed limit

## API Endpoints
* Create new payout
* Retrieve payout by ID
* Retrieve all payouts for user

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Step 1: Spec to Stories

AI generated 3 complete user stories from the specification:

Create New Payout (POST endpoint)
Retrieve Specific Payout by ID (GET endpoint)
Retrieve All Payouts for User (GET with pagination)

Example - Story 1 Output:

markdown

### Story 1: Create New Payout
**As a** user  
**I want to** create a new payout request via API  
**So that** I can request funds to be transferred to my account

**API Contract:**
- **Endpoint**: POST /v1/payouts
- **Request**:
{
  "userId": "123e4567-e89b-12d3-a456-426614174000",
  "amount": 25.50,
  "currency": "USD"
}

**Acceptance Criteria:**
- Given valid userId, amount, and currency, when creating payout, then returns 201 with payout details
- Given multiple valid requests under total limit, when creating payouts, then all are accepted

**Validation & Error Scenarios:**
- Given null userId → returns 422 with NotNull validation error
- Given invalid UUID format → returns 422 with Pattern validation error
- Given amount exceeding individual limit → returns 422 with DecimalMax validation error
- Given total user payouts would exceed limit → returns 422 with business rule violation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

What Worked: AI created user-focused stories with proper API contracts and comprehensive error scenarios. I don't agree that it defined DecimalMax for 'exceeding individual limit, but it can be tuned later.

Step 2: Stories to Development Tasks

For each story, AI generated tasks separated by hexagonal architecture layers, including both objectives (what the layer accomplishes) and test scenarios (behavioral requirements):

Domain Layer Task:

markdown

## Task 1: Domain Layer - Payout Business Rules

### Objective
Design and test business logic for payout creation, focusing on business 
constraints, validation rules, and user limits.

### Test Cases to Design

**Amount Business Validation Batch**:
- should accept payout when amount is positive and within limits
- should reject payout when amount is not positive  
- should reject payout when amount exceeds individual payout limit
- should reject payout when amount would cause user total to exceed limit

**Currency Business Rules Batch**:
- should accept payout when currency is supported
- should reject payout when currency is not supported

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

API Layer Task:

markdown

## Task 3: API Layer - Payout HTTP Interface

### Test Cases to Design

**Request Format Validation Batch**:
- should reject request when userId is not UUID format
- should reject request when amount is not a number
- should reject request when currency is not 3-letter code format

**Success Response Batch**:
- should return 201 with payout details when request is valid
- should return payout with generated ID and timestamps

1
2
3
4
5
6
7
8
9
10
11
12

Storage Layer Task:

markdown

## Task 2: Storage Layer - Payout Data Persistence

### Test Cases to Design

**Payout Storage Batch**:
- should store payout with generated UUID and all required data
- should store payout with correct timestamps

**Payout Retrieval Batch**:
- should find payouts by userId
- should calculate total payout amount for user from stored payouts

1
2
3
4
5
6
7
8
9
10
11

Key Discoveries

1. Specs Must Focus on Behavior, Not Limits

The Problem: When I included specific limits in requirements:

❌ "Amount must not exceed 30"
❌ "Allowed currencies: EUR, USD, GBP"

1
2

AI propagated these concrete values throughout stories, tasks, and tests, creating implementation-focused scenarios rather than behavioral ones.

The Solution: Use behavioral descriptions:

✅ "Amount must not exceed allowed limit"
✅ "Currency must be one of allowed currencies"

1
2

Why This Matters: Behavioral specs lead to configurable implementations. Specific limits lead to hardcoded solutions.

2. Validation Rules Need Precision

Vague Requirements Fail:

❌ "All fields are required"

Specific Requirements Work:

✅ "UserId - must not be empty and have valid UUID format"
✅ "Amount - must be positive and not exceed allowed limit"  
✅ "Currency - must be valid ISO code and one of allowed currencies"

1
2
3

The Insight: AI needs explicit guidance about validation types to generate proper test scenarios.

The Prompt + Guide Pattern Discovery

The Story-to-Tasks Context Problem

When converting stories to development tasks, I initially provided comprehensive convention files (validation patterns, testing strategies). The results included non-behavioral tests like:

❌ "should use configuration to determine supported currencies"

This happened because AI read configuration conventions and treated them as things to test, not implementation details.

The Breakthrough: I discovered this was a context organization problem specific to the story-to-tasks conversion step.

The Solution: Minimal Prompt + Focused Guide

For the story-to-tasks conversion, I created:

Streamlined Prompt:

Convert the story to development tasks following the attached guide.

Structure each task with:
- Objective (what this layer accomplishes)
- Test Cases to Design (behavioral scenarios in plain English)

Focus on WHAT to test, not HOW to implement.
Follow the attached testing guide for conventions.

1
2
3
4
5
6
7
8

Single Testing Guide:

VibeTDD batching principles
Layer responsibilities in hexagonal architecture
Test naming conventions ("should [action] when [condition]")
Behavioral vs implementation testing
Anti-patterns to avoid

Results After Reorganization

Clear behavioral focus: Tests like "should reject payout when amount exceeds limit"
Proper layer separation: API tests format validation, Domain tests business rules
Consistent naming: All test scenarios follow behavioral patterns
No implementation leakage: Configuration usage doesn't become test scenarios

Note: For the spec-to-stories conversion, I kept the original approach with multiple API convention files since specific examples of request/response structure are needed for proper story generation.

Lessons for VibeTDD

1. AI as Business Analyst Works

AI excels at breaking down requirements into structured work when given proper context about:

User story format and language
API design patterns
Testing layer responsibilities
Behavioral vs implementation focus

2. Context Organization Trumps Context Quantity

The key insight: AI collaboration quality depends more on how you organize context than how much context you provide.

3. The Prompt + Guide Pattern is Task-Specific

This pattern worked well for story-to-tasks conversion and can potentially be applied to other AI tasks:

Code review: Minimal prompt + comprehensive review guide
Architecture documentation: Simple instruction + detailed documentation patterns
Requirements analysis: Basic task + thorough analysis framework

However, each task type may need different context organization strategies - spec-to-stories still benefits from multiple specific API convention examples.

Next up: Testing whether AI can take these behavioral test scenarios and generate actual test and then working code. If the spec-to-stories-to-tasks workflow creates proper behavioral requirements, can AI implement them without falling into the architectural traps I've seen before?

VibeTDD Experiment 4.2: From Specs to Stories to Tasks - AI as Business Analyst #

The Workflow: Spec → Stories → Tasks #

The Goal #

The Test Subject: Payout Service #

Step 1: Spec to Stories #

Step 2: Stories to Development Tasks #

Key Discoveries #

1. Specs Must Focus on Behavior, Not Limits #

2. Validation Rules Need Precision #

The Prompt + Guide Pattern Discovery #

The Story-to-Tasks Context Problem #

The Solution: Minimal Prompt + Focused Guide #

Results After Reorganization #

Lessons for VibeTDD #

1. AI as Business Analyst Works #

2. Context Organization Trumps Context Quantity #

3. The Prompt + Guide Pattern is Task-Specific #

VibeTDD Experiment 4.2: From Specs to Stories to Tasks - AI as Business Analyst

The Workflow: Spec → Stories → Tasks

The Goal

The Test Subject: Payout Service

Step 1: Spec to Stories

Step 2: Stories to Development Tasks

Key Discoveries

1. Specs Must Focus on Behavior, Not Limits

2. Validation Rules Need Precision

The Prompt + Guide Pattern Discovery

The Story-to-Tasks Context Problem

The Solution: Minimal Prompt + Focused Guide

Results After Reorganization

Lessons for VibeTDD

1. AI as Business Analyst Works

2. Context Organization Trumps Context Quantity

3. The Prompt + Guide Pattern is Task-Specific