Developer-Experience

GitHub Spec-Kit Spec Driven Dev is in the Right Direction: Disciplined, Inceremental Vibe-Coding

Roy Osherove

15 Sep 2025 — 15 min read

Since vibe-coding began, the enshitification of code and making throw-sway stuff have become very easy. I think projects like Spec-Kit are there to help mitigate this, and not a moment too soon.
I think we'll see more of these "prompt-frameworks" such as socratic coder, Claude Templates, BMAD-METHOD, ai-rulez and others in the near future.

Among these, Spec-Kit seems very promising, and very well thought-out - think Kiro but open source and supports multiple tools.

I find the copy on top a bit too vague. I would define it as

A structured way to building AI-paired complex features: Tame your AI pair programmer into working in a structured, step-by-step software design process, including specification, planning, DOs and DONTs and make it a repeatable process for the rest of your team(s) as well.

The steps themselves are pretty straightforward:

I'm personally using Cursor so here's the strategy it uses for cursor:

Uses Cursor Commands so that you can type "/specify" and "/plan" directly in the chat (it creates these commands under .cursor/commands/specify.md and /plan.md etc..)
It generates a template CONSTITUTION.md file that the prompts in the cursor commands will use as its set of guiding principles.
it creates a .specify directory that is used as both the memory and various scripts and templates that the agentic feature in cursor (or claude etc..) can use for knowing and passing on for future reference and flows, across multiple developers. basically it's the rules, state and tools that is uses to create a relatively consistent experience when using the commands mentioned above, across multiple sessions

Create a new feature

you start with /specify, and then as you go alone, the agent uses the prompts inside the commands to do things like:

Generate a new folder for the feature we are working on
this will contain all the planning, design and documentation for the feature
and will be used for generating the code later on

As you go through the motions, it feels like it's been relatively thought out. We all tried to reinvent this wheel in our own organizations, and now it seems to have been relatively (started to be) solved in what is a (currently, and could change in a month) standardized way.

Once you finish with the planning phase, you move toe the /tasks phase, and cursor starts generating tasks from all the plans we made:

The agent got stuck or timed out at some point, so I stopped it and asked to "continue" from where it left off - it worked fine due to all the "memory" the structure leaves behind:

# Tasks: Photo Album Organization Application

**Input**: Design documents from `/specs/001-build-an-application/`
**Prerequisites**: plan.md (required), research.md, data-model.md, contracts/, quickstart.md

## Execution Flow (main)
```
1. Load plan.md from feature directory
   → If not found: ERROR "No implementation plan found"
   → Extract: tech stack, libraries, structure
2. Load optional design documents:
   → data-model.md: Extract entities → model tasks
   → contracts/: Each file → contract test task
   → research.md: Extract decisions → setup tasks
3. Generate tasks by category:
   → Setup: project init, dependencies, linting
   → Tests: contract tests, integration tests
   → Core: models, services, CLI commands
   → Integration: DB, middleware, logging
   → Polish: unit tests, performance, docs
4. Apply task rules:
   → Different files = mark [P] for parallel
   → Same file = sequential (no [P])
   → Tests before implementation (TDD)
5. Number tasks sequentially (T001, T002...)
6. Generate dependency graph
7. Create parallel execution examples
8. Validate task completeness:
   → All contracts have tests?
   → All entities have models?
   → All endpoints implemented?
9. Return: SUCCESS (tasks ready for execution)
```

## Format: `[ID] [P?] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- Include exact file paths in descriptions

## Path Conventions
- **Web app**: `backend/src/`, `frontend/src/`
- Paths shown below follow web application structure from plan.md

## Phase 3.1: Setup
- [ ] T001 Create project structure with backend/ and frontend/ directories
- [ ] T002 Initialize Node.js backend project with Express, TypeScript, SQLite dependencies
- [ ] T003 Initialize React frontend project with TypeScript, @dnd-kit, React dependencies
- [ ] T004 [P] Configure ESLint and Prettier for backend in backend/.eslintrc.json
- [ ] T005 [P] Configure ESLint and Prettier for frontend in frontend/.eslintrc.json
- [ ] T006 [P] Set up Jest configuration for backend in backend/jest.config.js
- [ ] T007 [P] Set up Jest and React Testing Library for frontend in frontend/jest.config.js
- [ ] T008 [P] Configure Playwright E2E tests in tests/e2e/playwright.config.ts

## Phase 3.2: Tests First (TDD) ⚠️ MUST COMPLETE BEFORE 3.3
**CRITICAL: These tests MUST be written and MUST FAIL before ANY implementation**

### Contract Tests (API Endpoints)
- [ ] T009 [P] Contract test GET /albums in tests/contract/test_albums_get.spec.ts
- [ ] T010 [P] Contract test POST /albums in tests/contract/test_albums_post.spec.ts
- [ ] T011 [P] Contract test GET /albums/{albumId} in tests/contract/test_albums_get_by_id.spec.ts
- [ ] T012 [P] Contract test PATCH /albums/{albumId} in tests/contract/test_albums_patch.spec.ts
- [ ] T013 [P] Contract test POST /albums/reorder in tests/contract/test_albums_reorder.spec.ts
- [ ] T014 [P] Contract test GET /photos in tests/contract/test_photos_get.spec.ts
- [ ] T015 [P] Contract test POST /photos in tests/contract/test_photos_post.spec.ts
- [ ] T016 [P] Contract test GET /photos/{photoId} in tests/contract/test_photos_get_by_id.spec.ts
- [ ] T017 [P] Contract test DELETE /photos/{photoId} in tests/contract/test_photos_delete.spec.ts
- [ ] T018 [P] Contract test GET /photos/{photoId}/file in tests/contract/test_photos_file_get.spec.ts
- [ ] T019 [P] Contract test GET /user/preferences in tests/contract/test_preferences_get.spec.ts
- [ ] T020 [P] Contract test PATCH /user/preferences in tests/contract/test_preferences_patch.spec.ts

### Integration Tests (User Scenarios)
- [ ] T021 [P] Integration test photo upload and auto-grouping in tests/integration/test_upload_auto_group.spec.ts
- [ ] T022 [P] Integration test album drag-and-drop reordering in tests/integration/test_album_reorder.spec.ts
- [ ] T023 [P] Integration test photo tile browsing in tests/integration/test_photo_browsing.spec.ts
- [ ] T024 [P] Integration test undated photo handling in tests/integration/test_undated_photos.spec.ts
- [ ] T025 [P] Integration test user preferences persistence in tests/integration/test_preferences.spec.ts

## Phase 3.3: Core Implementation (ONLY after tests are failing)

### Data Models
- [ ] T026 [P] Photo model with validation in backend/src/models/Photo.ts
- [ ] T027 [P] Album model with validation in backend/src/models/Album.ts
- [ ] T028 [P] UserPreferences model in backend/src/models/UserPreferences.ts
- [ ] T029 [P] Database schema and migrations in backend/src/database/schema.sql
- [ ] T030 [P] Database connection setup in backend/src/database/connection.ts

### Core Libraries (Constitutional Requirement)
- [ ] T031 [P] photo-manager library: EXIF parsing in backend/src/lib/photo-manager/exif.ts
- [ ] T032 [P] photo-manager library: thumbnail generation in backend/src/lib/photo-manager/thumbnails.ts
- [ ] T033 [P] photo-manager CLI interface in backend/src/lib/photo-manager/cli.ts
- [ ] T034 [P] album-organizer library: date grouping in backend/src/lib/album-organizer/grouping.ts
- [ ] T035 [P] album-organizer library: drag-drop persistence in backend/src/lib/album-organizer/reorder.ts
- [ ] T036 [P] album-organizer CLI interface in backend/src/lib/album-organizer/cli.ts
- [ ] T037 [P] file-handler library: upload processing in backend/src/lib/file-handler/upload.ts
- [ ] T038 [P] file-handler library: storage management in backend/src/lib/file-handler/storage.ts
- [ ] T039 [P] file-handler CLI interface in backend/src/lib/file-handler/cli.ts

### Services Layer
- [ ] T040 PhotoService CRUD operations in backend/src/services/PhotoService.ts
- [ ] T041 AlbumService CRUD and reordering in backend/src/services/AlbumService.ts
- [ ] T042 UserPreferencesService in backend/src/services/UserPreferencesService.ts
- [ ] T043 FileProcessingService in backend/src/services/FileProcessingService.ts

### API Endpoints Implementation
- [ ] T044 GET /albums endpoint in backend/src/routes/albums.ts
- [ ] T045 POST /albums endpoint in backend/src/routes/albums.ts
- [ ] T046 GET /albums/{albumId} endpoint in backend/src/routes/albums.ts
- [ ] T047 PATCH /albums/{albumId} endpoint in backend/src/routes/albums.ts
- [ ] T048 POST /albums/reorder endpoint in backend/src/routes/albums.ts
- [ ] T049 GET /photos endpoint in backend/src/routes/photos.ts
- [ ] T050 POST /photos endpoint with file upload in backend/src/routes/photos.ts
- [ ] T051 GET /photos/{photoId} endpoint in backend/src/routes/photos.ts
- [ ] T052 DELETE /photos/{photoId} endpoint in backend/src/routes/photos.ts
- [ ] T053 GET /photos/{photoId}/file endpoint in backend/src/routes/photos.ts
- [ ] T054 GET /user/preferences endpoint in backend/src/routes/preferences.ts
- [ ] T055 PATCH /user/preferences endpoint in backend/src/routes/preferences.ts

### Frontend Components
- [ ] T056 [P] Album grid component with drag-drop in frontend/src/components/AlbumGrid.tsx
- [ ] T057 [P] Album card component with preview in frontend/src/components/AlbumCard.tsx
- [ ] T058 [P] Photo tile component in frontend/src/components/PhotoTile.tsx
- [ ] T059 [P] Photo grid component with virtual scrolling in frontend/src/components/PhotoGrid.tsx
- [ ] T060 [P] Upload component with file handling in frontend/src/components/Upload.tsx
- [ ] T061 [P] Settings/Preferences component in frontend/src/components/Settings.tsx
- [ ] T062 Main page layout in frontend/src/pages/MainPage.tsx
- [ ] T063 Album detail page in frontend/src/pages/AlbumPage.tsx
- [ ] T064 API service layer in frontend/src/services/api.ts
- [ ] T065 State management with Context in frontend/src/context/AppContext.tsx

## Phase 3.4: Integration
- [ ] T066 Connect services to database in backend/src/services/
- [ ] T067 File upload middleware configuration in backend/src/middleware/upload.ts
- [ ] T068 Error handling middleware in backend/src/middleware/error.ts
- [ ] T069 Logging middleware with Winston in backend/src/middleware/logging.ts
- [ ] T070 CORS and security headers in backend/src/middleware/security.ts
- [ ] T071 Frontend API integration in frontend/src/services/
- [ ] T072 Service Worker for thumbnail caching in frontend/public/sw.js

## Phase 3.5: Polish
- [ ] T073 [P] Unit tests for EXIF parsing in tests/unit/photo-manager/test_exif.spec.ts
- [ ] T074 [P] Unit tests for thumbnail generation in tests/unit/photo-manager/test_thumbnails.spec.ts
- [ ] T075 [P] Unit tests for date grouping in tests/unit/album-organizer/test_grouping.spec.ts
- [ ] T076 [P] Unit tests for reorder logic in tests/unit/album-organizer/test_reorder.spec.ts
- [ ] T077 [P] Unit tests for file handling in tests/unit/file-handler/test_upload.spec.ts
- [ ] T078 [P] Frontend component unit tests in frontend/src/components/__tests__/
- [ ] T079 Performance tests for large photo collections in tests/performance/test_large_collection.spec.ts
- [ ] T080 E2E tests for complete user workflows in tests/e2e/test_user_workflows.spec.ts
- [ ] T081 [P] Update library documentation in llms.txt format
- [ ] T082 [P] API documentation generation from OpenAPI spec
- [ ] T083 Code cleanup and refactoring
- [ ] T084 Run quickstart.md validation scenarios

## Dependencies
**Critical Dependencies (blocking)**:
- Setup (T001-T008) before all tests and implementation
- Contract tests (T009-T020) before corresponding endpoints (T044-T055)
- Integration tests (T021-T025) before implementation
- Models (T026-T030) before services (T040-T043)
- Libraries (T031-T039) before services (T040-T043)
- Services (T040-T043) before endpoints (T044-T055)
- API endpoints (T044-T055) before frontend integration (T071)
- Core implementation before polish (T073-T084)

**Specific Blocking Relationships**:
- T026-T028 block T040-T043 (models before services)
- T031-T039 block T040-T043 (libraries before services)
- T040-T043 block T044-T055 (services before endpoints)
- T044-T055 block T071 (API before frontend integration)
- T066-T070 block T080 (middleware before E2E tests)

## Parallel Execution Examples

### Setup Phase (T004-T008)
```bash
# Launch T004-T008 together:
Task: "Configure ESLint and Prettier for backend in backend/.eslintrc.json"
Task: "Configure ESLint and Prettier for frontend in frontend/.eslintrc.json"
Task: "Set up Jest configuration for backend in backend/jest.config.js"
Task: "Set up Jest and React Testing Library for frontend in frontend/jest.config.js"
Task: "Configure Playwright E2E tests in tests/e2e/playwright.config.ts"
```

### Contract Tests Phase (T009-T020)
```bash
# Launch T009-T020 together (all different files):
Task: "Contract test GET /albums in tests/contract/test_albums_get.spec.ts"
Task: "Contract test POST /albums in tests/contract/test_albums_post.spec.ts"
Task: "Contract test GET /albums/{albumId} in tests/contract/test_albums_get_by_id.spec.ts"
# ... (all 12 contract tests can run in parallel)
```

### Models Phase (T026-T030)
```bash
# Launch T026-T030 together:
Task: "Photo model with validation in backend/src/models/Photo.ts"
Task: "Album model with validation in backend/src/models/Album.ts"
Task: "UserPreferences model in backend/src/models/UserPreferences.ts"
Task: "Database schema and migrations in backend/src/database/schema.sql"
Task: "Database connection setup in backend/src/database/connection.ts"
```

### Libraries Phase (T031-T039)
```bash
# Launch T031-T039 together (all independent libraries):
Task: "photo-manager library: EXIF parsing in backend/src/lib/photo-manager/exif.ts"
Task: "photo-manager library: thumbnail generation in backend/src/lib/photo-manager/thumbnails.ts"
Task: "photo-manager CLI interface in backend/src/lib/photo-manager/cli.ts"
# ... (all 9 library tasks can run in parallel)
```

## Notes
- **[P] tasks** = different files, no dependencies, can run in parallel
- **Verify tests fail** before implementing (RED phase of TDD)
- **Commit after each task** for proper version control
- **Constitutional compliance**: All features implemented as libraries with CLI interfaces
- **TDD enforcement**: Tests must be written first and must fail before implementation

## Task Generation Rules Applied

1. **From Contracts (api.yaml)**:
   - 7 API endpoint groups → 12 contract test tasks [P]
   - Each endpoint → corresponding implementation task

2. **From Data Model**:
   - 3 entities (Photo, Album, UserPreferences) → 3 model tasks [P]
   - Database setup → schema and connection tasks [P]

3. **From Constitutional Requirements**:
   - 3 libraries required → 9 library tasks (3 files × 3 libraries) [P]
   - Each library → CLI interface task [P]

4. **From User Stories (quickstart.md)**:
   - 5 scenarios → 5 integration test tasks [P]
   - Each scenario → validation in polish phase

5. **Ordering Applied**:
   - Setup → Tests → Models → Libraries → Services → Endpoints → Frontend → Polish
   - TDD enforced: All tests before any implementation

## Validation Checklist
*GATE: All items verified*

- [x] All contracts have corresponding tests (T009-T020)
- [x] All entities have model tasks (T026-T028)
- [x] All tests come before implementation (Phase 3.2 before 3.3)
- [x] Parallel tasks truly independent (different files, no shared state)
- [x] Each task specifies exact file path
- [x] No task modifies same file as another [P] task
- [x] Constitutional requirements met (libraries with CLI interfaces)
- [x] TDD cycle enforced (RED-GREEN-Refactor)

Note that the first 10% of this file are instructions to the agent, and the final part is also instructions to the agent. This theme occurs everywhere with this structure, and I think it makes for a more consistent agent experience.

You do need to read the whole thing, to see if you agree with it!

I love the fact that this is using Test-Driven-Development as a way to get thigns to actually work. As a big advocate of TDD when it makes sense, I think TDD can absolutely make sense for an agent that needs a continuous feedback loop as quickly as possibly - so starting with the end vision in mind is a perfect way to "reward" an agent's work as it works on tons of code.

Verifying it with Grok-code-fast

Up until now I used Claude Sonnet 4 , which isn't idea for such planning tasks. I knew that coming in, but want to see what happens.

Now, with all these planning outputs already ready, I did another pass of of all the plan and tasks , this time using grok-code-fast-1m to see what happens:

[20 tools called]

## Implementation Audit Complete ✅

I've conducted a thorough audit of the implementation plan and detail files. The original tasks.md was missing **critical implementation references** that would make it difficult for developers to understand where to find the detailed specifications for each task.

## **Major Gaps Identified**

### 1. **Missing Implementation References**
- **Data Models**: Tasks didn't reference `data-model.md` for entity definitions, validation rules, and SQL schemas
- **API Endpoints**: Tasks didn't reference `contracts/api.yaml` for OpenAPI specifications and request/response schemas  
- **Libraries**: Tasks didn't reference `research.md` for specific technology choices (Sharp, ExifReader, @dnd-kit)
- **Business Logic**: Tasks didn't reference `data-model.md` Data Flow Patterns and Business Rules
- **Performance**: Tasks didn't reference `plan.md` performance goals and `research.md` optimization strategies

### 2. **Missing Technical Implementation Tasks**
- **Edge Cases**: No tasks for handling corrupted EXIF data, duplicate photos, or "Unknown Date" albums
- **Error Handling**: No specific tasks for spec.md edge cases (corrupted dates, large albums)
- **Performance**: No tasks for the stated performance goals (<2s load time, 10k+ photos)
- **Integration Tests**: Missing file system storage integration tests

### 3. **Missing Business Logic Tasks**
- **Orphan Handling**: No tasks for photos without EXIF dates
- **Duplicate Detection**: No tasks for checksum-based duplicate prevention
- **Sort Order Management**: No tasks for maintaining continuous album ordering
- **Cover Photo Logic**: No tasks for automatic cover photo selection

## **Improvements Made**

### ✅ **Enhanced Task References**
Every implementation task now includes specific references:
- `Data Models → data-model.md` (entities, validation, SQL schemas)
- `Libraries → research.md` (Sharp, ExifReader, @dnd-kit technologies)  
- `Services → data-model.md` (Data Flow Patterns, Business Rules)
- `API Endpoints → contracts/api.yaml` (OpenAPI specifications)
- `Frontend → research.md` (technology choices) + `spec.md` (UI requirements)
- `Integration → plan.md` (observability requirements)

### ✅ **Added Missing Tasks (17 new tasks)**
- **Edge Cases** (T079-T082): Corrupted EXIF, duplicates, unknown dates, performance
- **Integration Tests** (T085): File system storage testing
- **Enhanced Polish** (T086-T089): Documentation, validation, constitution compliance

### ✅ **Improved Task Specificity**
- **Exact File Paths**: All tasks specify precise locations
- **Technology References**: Specific libraries mentioned (Sharp, Multer, @dnd-kit)
- **Constitutional References**: CLI requirements, library docs, TDD enforcement
- **Performance References**: Load time goals, collection size handling

### ✅ **Enhanced Dependencies**
- **Database Setup**: Models block services (T026-T030 → T040-T043)
- **Edge Cases**: Error handling blocks validation (T079-T082 → T089)
- **Unit Tests**: Block integration tests (T073-T077 → T083-T085)

## **Updated Task Count: 89 Tasks**
- **Original**: 72 tasks (missing critical implementation details)
- **Improved**: 89 tasks (comprehensive with full references)

## **Result**
The tasks.md now provides **complete implementation guidance** with references to all relevant design documents. Each task is specific enough that developers can immediately understand:
- What to implement
- Where to find the detailed specifications  
- Which technologies and patterns to use
- How the implementation fits constitutional requirements

The audit revealed that the original task list, while structurally sound, was missing the crucial "implementation detail references" that make tasks actionable for developers following the TDD and constitutional principles.

💡

I find the idea of commands in cursor to be very useful for this type of flow.
/ commands allows creating our own little custom "wizards" that are actually smart and useful than just using tranditional scripts that have to account fo all types of user errors or decisions. With commands you give a set goal, and a set of tools we can use, a set flow and ask the agent to follow it, and what to do in different cases. but you use plain language and the inputs can be very very varied.
I think it's perfect for such a flow of design, at least for now.

Time to implement

I kept using Grok-code-fast-1m and triggered the implementation:

It forgot to add stuff to .gitignore, so we got this quickly:

8 minutes later (grok code is indeed very fast) I had this message waiting:

My reply:

As the context in the chat approached 70% I stopped the current session and started a new chat session with fresh context, and told the agent to continue where it left off, with the context of my feature folder:

so far so good:

15 minutes later, agent announced it's all working fine.
Starting the backend did work fine.

The frontend has some issues. I needed to eventually change to claude opus to fix those issues (wrong ports to query). Then new issues came up :

then I remembered I can use the playwright MCP tool, so I asked cursor to use it to test the website:

OK it generated a bunch of playwright tests and ran them. useful! it even tested uploading a file and more.

I went with env variables:
After 10 minutes of mucking around, the app was working fine.

It did not use a database (sqlite), it used pure in memory store by default.

No authentication or authorization.

But overall a pretty good POC that actually worked relatively well out of the box.

The big benefit here is that the underlying file syste, structure and architecture were overall not terrible and because I didn't really pay a lot of attention to the CONSITUTION.md file, it could probably have been much better.

I have still not tried it in a legacy code system, but will give it a go in the next few days/weeks as we try out more and more tools and techniques, and will report back here.

Summary

I love the idea. I think it's a keeper and I will try it in legacy and see what happens. But it's a sign of thing to come, much like Kiro was trying to do the same idea.

Pros

A good structured and REPEATABLE way to get into new more complex features that requires "real-world" architecture and guidelines
An emphasis (at least at prompt level) on TDD allows agents to test themselves more thoroughly
the combination of using curcor commands and good starter rules, and memory folders is very powerful and pre-solves a lot of issues of back and forth
You can continue a session easily due to the memory
Creating sub folders per specced items with all the context needed for each one is very powerful
Works across multiple AI-pairing environments like cursor, claude code, and others.

Cons

Agent easily gets TDD wrong. i.e it wanted to implemented all the tests upfront and get them failing together. For simplicity and "incrementality" it could be much better to go one-by-one
It's easy to miss or gloss over or not spend enough time to create a good CONSITUTION.md file, or the agent basically decides what's good for it. (react? typescript? vite? jest? npm? pnpm? backend and frontend same language? APIs? deployments? etc etc..)
The tests it comes up with on its own are not good enough. I had to tell it to add e2e frontend tests and to use playwright MCP to navigate and try things. (could probably be solved using constitution.md)
It still generates TONS of code and you need to look at all of it
I'm not a fan of the "constitution" word, if only becuase I'm having such a hard type spelling it each and every time, but also because there are better words out there that could mean the same thing but are easier and more to the point to use. Here are some I liked by asking GPT-5 for ideas base don the contents of that file:
- Charter.md
- doctrine.md
- ethos.md
- tenets.md
- protocol.md
- policy.md
- bylaws.md
- dogma.md
- gospel.md
- blueprint.md
- northstar.md
- playbook.md

GitHub Spec-Kit Spec Driven Dev is in the Right Direction: Disciplined, Inceremental Vibe-Coding

Roy Osherove

Create a new feature

Verifying it with Grok-code-fast

Time to implement

Summary

Pros

Cons

Read more

Cursor & Other AI dev tools need a model-switch hook

Video: Zero-Touch Algorithm Handoffs: Shipping ML Algorithms to Prod Without Rewrites (or ML Engineers)

Video: RepoSwarm - Giving AI Agents Architecture Context Across All Your Repos (BuildStuff Conference)

Red Queen Code Review Pattern with AI