Understanding the Code Coverage Metric and Test Correctness

Code coverage is a useful indicator to ensure all of your code is executed by tests. However, this metric on its own is not sufficient to ensure your software is actually defect-free. To see why, have a look at the following examples. Each one has 100% test and branch coverage, and yet each is flawed in some way.

Examples

Adding Machine

Source: AddingMachine.java

Tests: AddingMachineTest.java

Why are these tests flawed?

addsOneAndTwo()

This test calls the add() method and asserts the expected output. But if you look at the implementation of add(), you can see this is flawed because the result is hard-coded and only works for the happy path test. There should be (good) tests covering the unhappy path as well.

addsThreeAndFour()

This test calls the add() method, but has no assertions! Think this is a joke? Read Martin Fowler: Assertion Free Testing .

addsFiveAndSix()

This test calls the add() method and asserts the expected output. But the assertion is wrong! Again, this is not a joke. This is a simple example, but tests can easily assert the wrong thing, particularly when they’re copied and pasted from elsewhere. If the test is particularly long and opaque, with a lot of setup code, this can be very hard to catch during code review.

Transaction Machine

Source: TransactionMachine.java

Tests: TransactionMachineTest.java

Why are these tests flawed?

createTransaction_returnsExpectedResult()

This test calls the createTransaction() method and asserts on all the fields of the output. But if you run this in the debugger, all the fields are actually null! This tests pass, but does not assert on the actual behaviour we want. It is flawed because of circular reasoning: the expected and actual results are both constructed from the same method call (CreateTransactionResult.of(CreateTransactionRequest)), so of course the result will be the same. But the result is incorrect. The test should not dynamically construct the expected results; it should assert on quantities known at compile-time.

Order Number Validator

Source: OrderNumberValidator.java

Tests: OrderNumberValidatorTest.java

Why are these tests flawed?

All tests

These tests call the parseOrder() method with a broad range of inputs and asserts on all the fields of the output. Together, they cover all branches of the class under test. However, there is a subtle flaw in the regular expression which is not caught by the tests. Did you notice that the regular expression doesn’t allow 0 in the order number? In this case, the branch that is not covered exists outside the scope of the test coverage (since it’s part of java.util.regex.Pattern). Test coverage cannot account for branches in libraries.

Conclusion

Code coverage is important to measure so that we a) know that our coverage is trending in the right direction, and b) to indicate where additional coverage may be needed. However, code coverage by itself should not be considered a goal since “any measure that becomes a target ceases to be a good measure” (Goodhart’s Law ). We have explored some of the flaws of the code coverage metric so that developers can be more aware of the ways in which it is flawed. This should help reviewers properly scrutinize tests during code review, and not just rely on the code coverage metric as an indicator of quality.

Understanding the Code Coverage Metric and Test Correctness

Examples

Adding Machine

addsOneAndTwo()

addsThreeAndFour()

addsFiveAndSix()

Transaction Machine

createTransaction_returnsExpectedResult()

Order Number Validator

All tests

Conclusion

Further Reading