/

Software Testing

Performance Testing for Modern Web Apps: A Practical Guide

|

Yunhao Jiao

Performance testing is the testing discipline that teams most consistently skip until something goes wrong. The pattern is familiar: the application works correctly in development and testing, ships to production, and then falls over when real traffic arrives or when a large user triggers an expensive operation.

The cost of discovering performance problems in production is high: user impact, incident response time, emergency fixes under pressure. The cost of discovering them earlier — during development — is much lower. This guide covers how to build performance testing into your workflow before production teaches you why it matters.

What is Performance Testing?

Performance testing is the practice of evaluating how an application behaves under load — measuring response times, throughput, resource utilization, and stability under realistic and peak usage conditions.

Performance testing encompasses several specific activities:

Load testing — How does the application perform under expected load? Does response time stay within acceptable limits when 100, 500, or 1000 users are active simultaneously?

Stress testing — How does the application behave when pushed beyond its normal capacity? Where does it break, and does it break gracefully?

Spike testing — How does the application handle a sudden surge of traffic? Does it recover gracefully when the spike ends?

Soak testing — Does the application remain stable under sustained load over a long period? Are there memory leaks, connection pool exhaustion, or other issues that only appear over time?

Baseline testing — Establishing a performance baseline so that future changes can be compared against it. This is the form of performance testing most valuable for CI/CD integration.

Why Performance Regressions Are Hard to Catch

Functional tests verify correctness. Performance tests verify speed and stability. A function can be completely correct and catastrophically slow. Functional test suites don't catch performance regressions.

Common sources of performance regression that slip through functional testing:

  • An N+1 query bug introduced by AI-generated ORM code that works correctly but issues hundreds of database queries instead of one

  • A React component that re-renders unnecessarily on every state change, working correctly but causing poor UX

  • A background job that was optimized for small datasets and becomes exponentially slower at scale

  • An API endpoint that aggregates data from multiple services and introduces latency under concurrent load

  • A database migration that adds an unindexed column to a table that's queried frequently

None of these appear in functional test results. All of them affect real users significantly.

Web Performance Metrics That Matter

For web applications, performance is multidimensional. The metrics most correlated with user experience:

Core Web Vitals (Google's UX metrics):

  • LCP (Largest Contentful Paint): How long until the main content loads? Target < 2.5 seconds.

  • INP (Interaction to Next Paint): How quickly does the page respond to user interactions? Target < 200ms.

  • CLS (Cumulative Layout Shift): Does content shift unexpectedly as it loads? Target < 0.1.

API performance:

  • p50 response time: Median latency for API responses

  • p95/p99 response time: Latency for the slowest 5% / 1% of requests (where user-visible slowness lives)

  • Error rate under load: Does the error rate increase when traffic increases?

  • Throughput: How many requests per second can the system handle before degrading?

Integrating Performance Testing Into CI/CD

Performance Baselines

The most practical way to catch performance regressions in CI/CD is baseline comparison: measure your application's performance now, store the baseline, and alert when a PR causes performance to deviate significantly from the baseline.

This doesn't require a full load test on every PR — a targeted performance check on critical endpoints and key page loads is sufficient for regression detection.

Tools for Performance Testing

k6 — Modern, developer-friendly load testing tool. JavaScript-based test scripts, CLI execution, good CI/CD integration. Excellent for API load testing.

Playwright performance — Playwright can capture Core Web Vitals and page performance metrics as part of E2E test runs. Useful for tracking frontend performance alongside functional tests.

Lighthouse CI — Runs Google Lighthouse audits in CI/CD and fails builds when performance scores drop below thresholds. Best for frontend performance (Core Web Vitals) rather than backend load.

TestSprite — TestSprite's E2E test execution surfaces obvious performance issues as a side effect of functional testing: N+1 query bugs that cause visible timeouts, missing pagination that causes slow list endpoints, and synchronous blocking operations all produce test failures that show up in the functional test report. It's not a dedicated load testing tool, but it catches the most common AI-generated performance bugs as part of standard coverage.

A Practical CI/CD Performance Setup

For most teams, a practical starting point:

  1. Establish baselines: Run Lighthouse CI and a k6 script against your staging environment. Store the results.

  2. Run performance checks on PRs: Lighthouse CI on every PR for frontend performance. k6 for critical API endpoints on PRs that touch backend code.

  3. Alert on regression: Fail the PR gate if response times increase more than 20% or Core Web Vitals scores drop below thresholds.

  4. Full load tests before major releases: Run full load tests before significant releases or infrastructure changes, not on every PR.

Performance Testing for AI-Generated Code

AI coding tools introduce specific performance risks that make performance testing more important for AI-native teams:

ORM query patterns. AI coding agents often generate ORM queries that work correctly but are inefficient: N+1 queries, missing eager loading, unindexed lookups. These work fine with small datasets and degrade badly at scale.

Data transformation in React. AI-generated React components sometimes perform expensive computations on every render without memoization. Functionally correct, visually undetectable in development, noticeably slow with real data volumes.

Missing pagination. AI coding agents frequently generate list endpoints that return all records rather than paginated results. Correct behavior for small datasets, catastrophic behavior at scale.

Synchronous operations in async contexts. AI-generated code sometimes introduces blocking operations where async operations were intended, particularly in Node.js backends.

Performance baselines in CI/CD catch these regressions before they compound. TestSprite's test execution captures timing data alongside functional results, providing an early signal for performance issues introduced by AI-generated code.

Start performance-aware testing with TestSprite →