/

Industry Analysis

The Rise of Production Outages in 2025: What's Really Causing Them

|

Yunhao Jiao

IsDown.app has been tracking global service outages since 2022. Their data tells an uncomfortable story: outages have been increasing every year, with 2025 showing the steepest climb yet.

The founder shared the charts on Reddit. The community responded with theories: AI-generated code, engineering layoffs, offshore outsourcing, increasing system complexity. The truth is probably all of the above, but the data increasingly points to one dominant factor.

AI-generated code is entering production at unprecedented volume. And the verification infrastructure wasn't built for it.

The Data

ThousandEyes tracked global outage counts rising from 1,382 in January 2025 to 2,110 in March — a 53% increase in two months. The pattern showed volatile upward pressure throughout the year.

The Cortex Engineering Benchmark Report found that PRs per author increased 20% year-over-year (thanks to AI coding tools), while incidents per pull request increased 23.5%. More code, more problems, proportionally.

CodeRabbit's analysis of GitHub PRs found that AI-authored code contains 1.75x more logic and correctness errors — the category most directly responsible for production incidents. Security findings were 1.57x higher. Performance issues showed excessive I/O operations at 8x the rate of human-authored code.

A survey of over 1,000 CIOs and network engineers found that 84% of businesses reported rising network outages, with more than half seeing a 10-24% increase over a two-year timeframe.

The Structural Cause

The outage increase isn't caused by any single factor. But the timing is not coincidental.

2025 was the year AI coding tools went mainstream. GitHub Copilot crossed 1.8 million paid subscribers. Cursor became the default IDE for a generation of developers. Claude Code, Windsurf, and others entered the market. Anthropic reported that 70-90% of its own code was AI-generated. Spotify's best developers reportedly haven't written a line of code since December.

This adoption wave increased code output dramatically. But the testing, review, and verification processes at most organizations remained unchanged. The same code review workflows designed for human-speed development were applied to AI-speed output. The same test suites, if they existed, weren't updated to cover the expanded codebase.

The result: more code entering production with less per-line verification. The outage data reflects the predictable consequence.

What the Outage Trend Reveals About Testing

The rising outage rate is fundamentally a testing failure. Not a failure of testers — a failure of testing systems to scale with development speed.

When a team's code output increases 3x and their testing capacity stays flat, 2/3 of the new code ships without adequate verification. That unverified code accumulates risk. Risk eventually becomes incidents.

The fix isn't hiring more QA engineers. The economics don't support it and the timeline is too slow. The fix is automated testing that scales with code output.

This means: autonomous test generation that doesn't require human authoring. CI/CD integration that runs tests on every PR without human triggering. Execution speed that matches development cadence — under five minutes, not thirty.

TestSprite addresses each of these. Autonomous test generation from codebase and product specs. GitHub integration on every PR. Full-stack test suite in under five minutes. The testing infrastructure scales automatically as code output increases.

The outage trend is a systems problem that requires a systems solution. More human effort won't reverse a trend driven by exponential increase in code volume. Automated, autonomous testing will.

Try TestSprite free →