views:

926

answers:

13

I'm looking for real world examples of some bad side effects of code coverage.

I noticed this happening at work recently because of a policy to achieve 100% code coverage. Code quality has been improving for sure but conversely the testers seem to be writing more lax test plans because 'well the code is fully unit tested'. Some logical bugs managed to slip through as a result. They were a REALLY BIG PAIN to debug because 'well the code is fully unit tested'.

I think that was partly because our tool did statement coverage only. Still, it could have been time better spent.

If anyone has other negative side effects of having a code coverage policy please share. I'd like to know what kind of other 'problems' are happening out there in the real-world.

Thanks in advance.

EDIT: Thanks for all the really good responses. There are a few which I would mark as the answer but I can only mark one unfortunately.

+4  A: 

Sometimes corner cases are so rare they're not worth testing, yet a strict code-coverage rule requires you test it anyway.

For example, in Java the MD5 algorithm is built-in, but technically it's possible that an "unsupported algorithm" type exception is thrown. It's never thrown and your test would have to go through significant gyrations to test that path.

It would be a lot of work wasted.

Jason Cohen
I've actually once seen a problem on a dev machine that stemmed from an UnsupportedEncodingException for UTF-8. That's supposed to be built-in too, but that machine had a beta version of Sun's JDK installed where it wasn't...
Michael Borgwardt
Yes it can happen, I admit that. My point is that it's probably not worth the effort of testing when there's so many other more likely problems to test for. Code coverage can run you down inefficient paths.
Jason Cohen
+2  A: 
  1. Writing too targeted test cases.
  2. Insufficient input variability testing of the Code
  3. Large number of artificial test cases executed.
  4. Not concentrating on the important test failures due to noise.
  5. Difficulty in assigning defects because many conditions from many components must interact for a line to execute.

The worst side effect of having a 100% coverage goal is to spend a lot of the testing development cycle (75%+) hiting corner cases. Another poor effect of such a policy is the concentration of hitting a particular line of code rather than addressing the range of inputs. I don't really care that the strcpy function ran at least once. I really care that it ran against a wide variety of input. Having a policy is good. But having any extremely draconian policy is bad. The 100% metric of code coverage is neither necessary nor sufficient for code to be considered solid.

ojblass
+11  A: 

Just because there's code coverage doesn't mean you're actually testing all paths through the function.

For example, this code has four paths:

if (A) { ... } else { ... }
if (B) { ... } else { ... }

However just two tests (e.g. one with A and B true, one with A and B false) would give "100% code coverage."

This is a problem because the tendency is to stop testing once you've achieved the magic 100% number.

Jason Cohen
+1  A: 

Nothing wrong with code coverage - what I see wrong is the 100% figure. At some point the law of diminished returns kicks in and it becomes more expensive to test the last 1% than the other 99%. Code coverage is a worthy goal but common sense goes a long way.

Otávio Décio
+12  A: 

In my experience, the biggest problem with code coverage tools is the risk that somebody will fall victim to the belief that "high code coverage" equals "good testing." Most coverage tools just offer statement coverage metrics, as opposed to condition, data path or decision coverage. That means that it's possible to get 100% coverage on a bit of code like this:

for (int i = 0; i < MAX_RETRIES; ++i) {
    if (someFunction() == MAGIC_NUMBER) {
        break;
    }
}

... without ever testing the termination condition on the for loop.

Worse, it's possible to get very high "coverage" from a test that simply invokes your application, without bothering to validate the output, or validating it incorrectly.

Simply put, low code coverage levels is certainly an indication of insufficient testing, but high coverage levels are not an indication of sufficient or correct testing.

Eric Melski
+1  A: 

!00% code coverage means well tested code is a complete myth. As developers we know the hard/complex/delicate parts of a system, and I would much rather see those areas properly tested, and only get 50% coverage, rather than the meaningless figure that every line has been run at least once.

In terms of a real world example, the only team that I was on that had 100% coverage wrote some of the worst code I've ever seen. 100% coverage was used to replace code review - the result was predicatably awful, to the extent that most code was thrown away, even though it passed the tests.

MrTelly
+2  A: 

I know this isn't a direct answer to your question, but...

Any testing, regardless of what type, is insufficient by itself. Unit testing/code coverage is for developers. QA still needs to test the system as a whole. Business users still need to test the system as a whole as well.

The converse, QA tests the code completely, so developers shouldn't test is equally as bad. Testing is complimentary and different tests provide different things. Each test type can miss things that another might find.

Just like the rest of development, don't take shortcuts with testing, it'll only let bugs through.

TofuBeer
+1  A: 

We have good tools for measuring code-coverage from unit tests. So it's tempting to rely on code-coverage of 100% to represent that you're "done testing." This is not true.

As other folks have mentioned, 100% code coverage doesn't prove that you have tested adequately, nor does 50% code coverage necessarily mean that you haven't tested adequately.

Measuring lines of code executed by tests is just one metric. You also have to test for a reasonable variety of function inputs, and also how the function or class behaves depending on some other external state. For example, some code functions differently based on the data in a database or in a file.

I've also blogged about this recently: http://karwin.blogspot.com/2009/02/unit-test-coverage.html

Bill Karwin
+28  A: 

In a sentence: Code coverage tells you what you definitely haven't tested, not what you have.

Part of building a valuable unit test suite is finding the most important, high-risk code and asking hard questions of it. You want to make sure the tough stuff works as a priority. Coverage figures have no notion of the 'importance' of code, nor the quality of tests.

In my experience, many of the most important tests you will ever write are the tests that barely add any coverage at all (edge cases that add a few extra % here and there, but find loads of bugs).

The problem with setting hard and (potentially counter-productive) coverage targets is that developers may have to start bending over backwards to test their code. There's making code testable, and then there's just torture. If you hit 100% coverage with great tests then that's fantastic, but in most situations the extra effort is just not worth it.

Furthermore, people start obsessing/fiddling with numbers rather than focussing on the quality of the tests. I've seen badly written tests that have 90+% coverage, just as I've seen excellent tests that only have 60-70% coverage.

Again, I tend to look at coverage as an indicator of what definitely hasn't been tested.

Mark Simpson
+1 Very good explanation of the "problem" with code coverage.
Lieven
+1 I always like it when I up-vote someone else even when I have my own answer. :-)
Jason Cohen
+2  A: 

In my opinion, the greatest danger a team runs from measuring code coverage is that it rewards large tests, and penalizes small ones. If you have the choice between writing a single test that covers a large portion of your application's functionality, and writing ten small tests which test a single method, only measuring code coverage implies that you should write the large test.

However, writing the set of 10 small tests will give you much less brittle tests, and will test your application much more thoroughly than the one large test will. Thus, by measuring code coverage, particularly in an organization with still evolving testing habits, you can often set up the wrong incentives.

Scotty Allen
+1  A: 

100% code coverage doesn't mean you're done with usnit tests

function int divide(int a, int b) {
    return a/b;
}

With just 1 unit test, I get 100% code coverage for this function:

return divide(4,2) == 2;

Now, nobody would argue that this unit code with 100% coverage indicates that he feature works just fine.

I think code coverage is a good element to know if you are missing any obvious code path, but I would use it carefully.

Julien
+2  A: 

One of the largest pitfalls of code coverage is that people just talk about code coverage without actually specifying what type of code coverage they are talking about. The characteristics of C0, C1, C2 and even higher levels of code coverage are very different, so just talking about "code coverage" doesn't even make sense.

For example, achieving 100% full path coverage is pretty much impossible. If your program has n decision points, you need 2n tests (and depending on the definition, every single bit in a value is a decision point, so to achieve 100% full path coverage for an extremely simple function that just adds two ints, you need 18446744073709551616 tests). If you only have one loop, you already need infinitely many tests.

OTOH, achieving 100% C0 coverage is trivial.

Another important thing to remember, is that code coverage does not tell you what code was tested. It only tells you what code was run! You can try it out yourself: take a codebase that has 100% code coverage. Remove all the assertions from the tests. Now the codebase still has 100% coverage, but does not test a single thing! So, code coverage does not tell you what's tested, only what's not tested.

Jörg W Mittag
+1  A: 

There are tools out there, Jumble for one, that perform analysis through branch coverage, by mutating your code to see if your test fails for all different permutations.

Directly from their website:

Jumble is a class level mutation testing tool that works in conjunction with JUnit. The purpose of mutation testing is to provide a measure of the effectiveness of test cases. A single mutation is performed on the code to be tested, the corresponding test cases are then executed. If the modified code fails the tests, then this increases confidence in the tests. Conversely, if the modified code passes the tests this indicates a testing deficiency.

Mike
Jester is another similar "unit test tester". http://jester.sourceforge.net/
Quinn Taylor