/

AI Testing

Spec-Driven Testing: Why Requirements-First QA Beats Script-First Testing

|

Yunhao Jiao

There are two fundamentally different ways to think about automated testing. The first says: write scripts that confirm what the code does. The second says: write specifications for what the code should do, then verify the code against them.

Most test automation is the first kind. Spec-driven testing is the second. The difference is subtle until you need it, and then it's everything.

The Problem With Script-First Testing

Script-first testing is the dominant approach in automated testing. An engineer looks at the code, observes what it does, and writes a test that confirms that behavior. The test passes from the moment it's written — because the behavior it tests is the behavior that currently exists.

This is precisely the problem. A script-first test confirms implementation. It can only catch a regression (when something that was working stops working) — it cannot catch an intent gap (when something is working but doing the wrong thing).

Consider a simple example: your checkout form is supposed to require a billing address. A developer accidentally omits the validation — the form submits without one. A script-first test written against this implementation confirms that the form submits without a billing address. The test passes. The bug ships.

A spec-driven test written from the requirement — "billing address is required to complete checkout" — fails when the form submits without one. The bug is caught before it ships.

The difference isn't academic. Intent gaps — cases where the code is internally consistent but doesn't match requirements — are exactly the failure mode that AI coding tools introduce most frequently. Script-first testing is structurally incapable of catching them.

What is Spec-Driven Testing?

Spec-driven testing (also called requirements-driven testing or specification-based testing) is an approach where tests are derived directly from product requirements — user stories, acceptance criteria, functional specifications — rather than from inspection of the existing implementation.

The test is written (or generated) first, from the requirements, before the code exists. Or in the case of agentic testing, the test is generated from the requirements and compared against the implementation that was generated by an AI coding tool. In both cases, the requirements are the source of truth, not the implementation.

Spec-driven testing asks: does the code do what it's supposed to do? Script-first testing asks: does the code still do what it currently does? These are fundamentally different questions.

Why Spec-Driven Testing Is Built for AI Coding Tools

AI coding agents generate code based on prompts. A prompt is an informal specification. The better the specification, the better the output — and the better the ability to verify it.

TestSprite is built around spec-driven testing as a core architectural principle. The agentic testing engine reads your PRD, user stories, or requirements document, builds an internal structured model of what the software should do, and generates tests that verify implementation against that model.

This is distinct from a testing tool that analyzes your code and generates tests based on what the code does. Code analysis produces script-first tests. Requirements analysis produces spec-driven tests.

For AI-native teams, this matters for a specific reason: the most valuable bugs to catch are the ones where AI generated something plausible but wrong. Those bugs only appear in the gap between the specification and the implementation. You can only see that gap if you have both.

How Spec-Driven Testing Works in Practice

Step 1: The Specification Is the Source of Truth

Spec-driven testing starts with a clear requirements artifact. For TestSprite, this can be:

  • A formal PRD (Product Requirements Document)

  • User stories with acceptance criteria

  • A README describing the application's intended behavior

  • API documentation describing expected endpoint behavior

  • Or, for simpler projects, the codebase itself — TestSprite can infer intent from code structure when no formal spec exists

The quality of the tests is directly proportional to the clarity of the specification. Vague requirements produce vague tests. Specific acceptance criteria — "unauthenticated users who attempt to access /dashboard are redirected to /login" — produce specific, meaningful tests.

Step 2: Test Generation From Requirements

TestSprite's agentic testing engine parses the specification and generates a test plan. The plan covers:

  • Each user story or acceptance criterion as a testable scenario

  • Positive cases (the feature works as described)

  • Negative cases (the feature correctly handles invalid inputs and unauthorized access)

  • Edge cases derived from the specification (boundary values, empty states, concurrent operations)

This test plan is reviewable and adjustable before execution. Engineers can add, remove, or modify test cases.

Step 3: Execution Against the Implementation

Tests run in isolated cloud sandboxes against the actual application — not mocks, not simulations. The real application behavior is compared against the specification-derived expected behavior.

Failures represent specification violations: the code does something different from what the specification requires. These are always meaningful, because they represent a gap between what was specified and what was built.

Step 4: Failure Classification and Fix Loop

Spec-driven failures are classified into three categories:

Specification violation — the implementation doesn't match the requirement. This is a real bug. Fix recommendations are generated and sent to your coding agent via MCP.

Specification ambiguity — the test expectation was derived from an ambiguous requirement. This surfaces specification quality issues, prompting clarification rather than a code fix.

Test fragility — the test mechanism failed (locator drift, timing issue) rather than the application. Self-healing resolves this transparently.

Spec-Driven Testing vs. Test-Driven Development

Spec-driven testing and TDD share a common principle — define desired behavior before or independent of implementation. The practical difference is authorship:

In TDD, engineers write tests manually before writing code. This requires significant time investment and discipline, and doesn't scale to AI code generation velocity.

In spec-driven agentic testing, tests are generated automatically from requirements. The engineer writes the specification (which they'd write anyway to prompt the coding agent); the agentic testing engine generates the tests. The benefits of TDD — requirements-first verification — are achieved without the manual test authoring overhead.

For AI-native teams, spec-driven agentic testing is the practical realization of TDD principles at AI development velocity.

The Specification Quality Investment

Spec-driven testing makes the quality of your specifications directly visible in your test coverage. This is a feature, not a bug.

Teams that invest in clearer requirements before coding sessions discover two things: better AI-generated code (because the coding agent has more context), and better test coverage (because the agentic testing engine has more to derive tests from). The specification quality investment pays dividends in both directions.

The minimum specification for meaningful spec-driven testing:

  • Feature description: what is being built?

  • Acceptance criteria: what does success look like?

  • Edge cases: what inputs or states might cause problems?

  • Invariants: what must always be true?

A one-page requirements doc written in 15 minutes before a coding session is enough to generate comprehensive spec-driven test coverage. The alternative — retrofitting tests to AI-generated code after the fact — takes far longer and produces script-first tests that miss the gaps.

Getting Started

Spec-driven testing with TestSprite starts with your requirements document. Connect TestSprite to your repository via MCP, point it at your PRD or user stories, and get specification-derived test coverage generated automatically.

Start here →