/

Engineering

Security Testing in the Age of AI-Generated Code: Why QA Agents Need to Check More Than Functionality

|

Rui Li

AI coding tools have created an interesting new risk surface: code that works correctly but is insecure by default. An LLM generating a form handler doesn't think about SQL injection. An AI autocomplete filling in an authentication flow doesn't consider privilege escalation paths. The code passes functional tests. It fails security review three weeks later.

This isn't a hypothetical. As AI-generated code becomes a larger fraction of what ships, the gap between "it works" and "it's safe" widens. Security testing needs to move left alongside functional testing — and AI testing agents are the right mechanism to do it.

Why functional testing doesn't catch security issues

Functional tests verify that the application does what it's supposed to do. A user submits a form. The data saves. The confirmation appears. Pass.

Security tests verify that the application doesn't do things it shouldn't. An unauthenticated user requests a protected resource. The application returns 403, not the resource. An input containing a script tag gets sanitized before storage. A session token expires after logout.

These are different verification questions, and the second category is systematically undertested in most codebases. Not because engineers don't care about security — they do — but because writing security test cases is time-consuming, requires specific expertise, and doesn't get prioritized when sprint velocity is under pressure.

AI-generated code accelerates this problem because it ships faster. More surface area, shipped in less time, with security review as the bottleneck.

What security-aware AI testing agents check

A testing agent like TestSprite can be configured to run security-focused test flows alongside functional coverage. These aren't penetration tests — they're application-layer checks that verify security invariants the same way functional tests verify behavioral ones.

Authentication and authorization boundaries are the most critical. Every protected route should have an automated test that confirms it returns the right response for an unauthenticated request, a request with an expired token, and a request from a user with insufficient permissions. These tests are simple to describe in natural language and should run on every deploy: "A user without admin permissions should receive a 403 when accessing the admin dashboard."

Input validation is the second priority. AI-generated form handlers are particularly prone to trusting input too much. Testing agents can submit known malicious patterns — script tags, SQL fragments, oversized payloads — and verify that the application handles them correctly. This isn't a comprehensive security audit, but it catches the most common classes of input handling failures before they reach production.

Session management deserves its own test coverage: logout should invalidate the session, token refresh should behave correctly under concurrent requests, and session state shouldn't persist across authentication boundaries in unexpected ways.

Embedding security checks in CI

The right time to run security-aware tests is the same as the right time to run functional tests: on every pull request, before code merges.

Shifting security left means adding authorization checks and input validation tests to the same CI job that runs regression tests. TestSprite supports this workflow natively — security invariant tests are defined the same way as functional tests, described in plain English, and run against the same environments.

The benefit of this approach over periodic security audits or penetration testing isn't that it's more comprehensive — it isn't. The benefit is cadence. A security check that runs every time code changes catches regressions immediately, when the context is fresh and the fix is cheap. A quarterly security review catches the same issue months after it was introduced, when the fix requires archaeology.

The AI-generated code problem specifically

When developers use Cursor, Copilot, or any AI coding assistant, they're accepting code they didn't fully write and may not fully understand. That's not a criticism — it's the value proposition. But it creates a verification gap.

Functional tests, written by the same AI that generated the code, tend to test the happy path the AI optimized for. Security edge cases aren't in the happy path. A test suite that confirms "the login flow works" doesn't confirm "the login flow doesn't expose session tokens in URL parameters" or "the password reset endpoint can't be used to enumerate valid email addresses."

TestSprite's agentic approach helps here because the tests are defined by the human in terms of invariants — what must never be true — rather than generated from the implementation. The agent verifies the contract, not just the behavior the code happens to exhibit.

Starting small and expanding

You don't need a comprehensive security test suite on day one. Start with the three most critical authentication boundaries in your application and write simple invariant tests for each.

Confirm that unauthenticated users can't access protected data. Confirm that users can't access other users' data by modifying IDs in requests. Confirm that admin-only operations fail gracefully for non-admin users.

Those six tests, running on every PR, will catch a significant fraction of the security regressions that AI-generated code introduces. Build from there.