
Writing a test that passes is easy. Writing a test that fails when the thing it's testing breaks — and only then — is harder than most engineers acknowledge.
Tests that pass when they should fail are worse than no tests. They create false confidence. Engineers merge code assuming the test suite is protecting them. It isn't. The bug ships, the tests keep passing, and the failure surfaces in production where no one immediately understands why the tests didn't catch it.
This is called a test that lacks failure sensitivity. It's common, and it's worth taking seriously.
How tests fail to fail
The most common cause is asserting on the wrong thing. A test for "the user should see a success message after submitting the form" that asserts on the presence of any element with the class message will pass whether the message says "Success" or "Error: something went wrong." The assertion is too broad to catch the relevant failure.
Silent exception handling is another common cause. A try/catch block in the test itself that swallows exceptions and returns true turns every execution path into a pass. A test helper that catches errors and returns a default value rather than propagating the failure makes the test impossible to fail regardless of application behavior.
Race conditions in async tests produce intermittent sensitivity. A test that checks for a result before the async operation completes will sometimes catch the failure (when the check runs after the operation should have completed) and sometimes miss it (when the check runs before). The result is a test that fails occasionally rather than reliably — which trains engineers to dismiss the failure as a flake.
Mutation testing: proving your tests actually work
Mutation testing is the systematic approach to verifying that tests have failure sensitivity. The process introduces deliberate bugs — mutations — into the application code and checks whether any test fails as a result. If no test fails when a mutation is present, the mutation "survives" — which means the test suite has a coverage gap at that code path.
High mutation scores (the percentage of mutations killed by at least one test) correlate strongly with the ability of the test suite to catch real regressions. Teams that track mutation coverage alongside line coverage have meaningfully better defect detection rates than teams that track line coverage alone.
Writing tests with failure in mind
The most practical habit for improving test failure sensitivity is to ask "how would I know if this test is wrong?" before finalizing any assertion.
If the answer is "I would need to read the test carefully to notice it's asserting on the wrong thing," the assertion is too implicit. Explicit assertions — asserting on the exact expected value, the exact expected error message, the exact expected record count — are easier to reason about and harder to accidentally make vacuously true.
TestSprite's AI agents generate assertions based on the intent of the test description, which tends to produce more specific assertions than engineers write manually under time pressure. The agent tests what the description says should happen — the exact message, the exact state change, the exact navigation — rather than proxies that are loosely correlated with the intended behavior.
