



Hi, I am in the middle of putting together some guidelines around unit test code coverage and I want to specify a number that really makes sense. It's easy to repeat the 100% mantra that I see all over the internet without considering the cost benefit analysis and when diminishing returns actually sets in.

I solicit comments from persons who have actually reported code coverage on real-life, medium/large-sized projects. What percentages were you seeing? How much is too much? I really want some balance (in figures) that will help developers produce hight quality code. Is 65% coverage too low to expect? Is 80% too high?


It really depends. I know a lot of software that goes 0%. I have a lot of software that has single digit %. The main question is what really is needed, and wanted in financial terms.

+2  A: 

Personally I would go for 80% coverage, but of course this is only relative... I personally didn't achive this yet, too.

Currently we have very high coverage (99%) on our utility classes, which is good because bugs in there will hunt you through your whole application.

Mediocre coverage is for most GUIs, because writing tests for them is hard and time expensive, so we often leave it to opening the gui in the unit tests and if there is no error we close it automatically.


I don't think you can really have too much code coverage. I think you need to determine what code runs the "regular course of business" in the application and have that completely covered. For the remaining code that isn't in the normal course of business, start whittling that down by doing the most critical first. Abnormal business that isn't terribly important has low gain for getting good code coverage on it.

Like I said, I need numbers to put on a document for developers. That's the whole idea of metrics. thanks, though.
Sorry about the confusion. My point is hard numbers are pointless, IMO, you need to analyze each application separately and make an informed decision on the needs of the specific case.
@Jay don't feel bad. Most of us have tried to convey this point and we were down-voted. :)
San Jacinto
@san jacinto, thanks, I know what you mean. I just get very frustrated when people reject having to think about the needs of specific cases.
+1  A: 

When you mix code coverage with cyclomatic complexity, you can use the CRAP metric.


Individual Method Interpretation:

Bob Evans and I have looked at a lot of examples (using our code and many open source projects) and listened to a LOT of opinions. After much debate, we decided to INITIALLY use a CRAP score of 30 as the threshold for crappiness. Below is a table that shows the amount of test coverage required to stay below the CRAP threshold based on the complexity of a method:

Method’s Cyclomatic Complexity        % of coverage required to be
                                      below CRAPpy threshold
------------------------------        -------------------------------- 
0 – 5                                   0% 
10                                     42% 
15                                     57% 
20                                     71% 
25                                     80% 
30                                    100% 
31+                                   No amount of testing will keep methods    
                                      this complex out of CRAP territory.

No amount of code coverage is going to guarantee "high quality code" by itself alone.

From the comments...

It's definitely too lax to give simple methods a pass on coverage. What you will likely find when implementing this on existing code is that the code coverage will rise as you're refactoring those ugly methods (code coverage should rise otherwise you're refactoring dangerously).

The 0-5's are essentially low-hanging fruit and the ROI isn't all that great. That being said, those methods are wonderful for learning TDD because they're often very easy to test.

Austin Salonen
You seem to have done a lot of research here. I only have one or two concerns: Isn't it too lax to give cyclomatic 0-5 a pass on test coverage? Second, should we allow cyclomatic more than 10/15 to even make it into production code?Can we say, let's keep 95% of the code under cyclomatic 8 and then demand a coverage of 57 - 71 %? Does that sound realistic enough? Please pitch in everyone so we can mature this answer and adopt it.
If you could forbid cyclomatic complexity below 8, there wouldn't be a need for this metric. You can only control the complexity of a program to a certain degree. An operating system is inherently more complex than notepad. You can reduce your scope of study to smaller chunks of the program, but this only delays the problem. At some point, you need to integrate the components and the complexity will rise.
San Jacinto
"At some point, you need to integrate the components and the complexity will rise."@San Jacinto: Are you sure about this? I expect cyclomatic complexity measure to be limiited to within the scope of a function no matter how and where that function is called. Can you verify or disprove this, please? Thanks.
San Jacinto is referencing the complexity of component interaction, which is a whole other beast. It's effectively pulling out your mock objects and using the real ones; ideally there's no side effects but that's not always the case.
Austin Salonen
@Pita while you can reduce the number of control paths in THIS particular module, you have trouble doing the same thing for functions that this module calls. While you have lowered this module's complexity and raised cohesion, you have increased the complexity of the global flow graph and have introduced a potential exponential number of control paths by introducing more modules. It's not an easy trade-off to pick. But, in all technicality, you are correct. It is limited to the scope of this module.
San Jacinto
Marked as answer with the caveat that % expected coverage should be bumped up. These percentages may be correct for passing the 'crappiness' mark but I feel that b/w crappy and great, well-refactored, clean code is still a ways off.

The only correct answer is you test as much as you can afford. Obviously, this is an axiom across every engineering project.

Beyond that, it's all subjective and highly dependent upon the project at hand. For example, the flight control systems lockheed puts out had better be tested more than 80%, but 80% may suffice for my GUI front-end to an XML viewer.

Typically, you break down the cost of running tests with your team. In the theory world, it is customary to have man-hours as a result of the question: how much testing can we afford?

After this, you examine your modules and you determine which parts of the code have the most time spent in them. Each critical module should be covered once. From here on, you give an appropriate number of tests compared to the amount of time specific modules are executed. So in the end, there's no hard number of "X%" is covered.

John Musa has a really interesting book on the subject.

San Jacinto
Thanks but his question "solicit[s] comments from persons who have actually REPORTED CODE COVERAGE on real-life .. projects. It's not subjective if you have that experience.
@Pita of course it is. There are many unconstrained variables in assigning test coverage. It's silly to think that MY numbers work well for YOUR project.
San Jacinto

On the program that I'm on (~500k SLOC), we use 100%. That is a program requirement to proceed to the next phase of testing. Here are the reasons behind it:

  1. The program is used in some safety critical situations, and you don't want any off nominal conditions to not be tested

  2. If you aren't hitting 100%, then you either wrote code that isn't necessary, and are hence wasting money, or you aren't testing your off nominal paths completely. See #1.

  3. Your unit test scenarios should naturally get you close to 100%, regardless of the actual program code coverage metric you're using. If someone is at 95% based solely on their off nominal scenarios, requiring 100% isn't onerous (and, again, you should be asking why you aren't at a 100% then. See #2.)

Your mileage will certainly vary. If you aren't working on a mission / safety critical application, than you probably don't need to be worrying about your code coverage as much - however, I'd have to ask again: why are you writing code that you don't need?


Based on the comments I've received below, I should clarify. The program guideline is 100% code coverage for unit tests. That development process requirement can be waived if, for a technical reason, a branch of code cannot be reached (protected default constructor that is never called, etc.) Approval is usually granted from an external, independent portion of the organization (go go SQA).

From an integration / systems test, code coverage becomes moot, as you start looking at requirements coverage. That's a different ball of yarn altogether.

The original question was looking for real world situations: I agree that not (most?) all real world situations will warrant 100% code coverage on a unit test level, but there are certainly cases that do, and programs that do. And it is a habit of some developers to write code that they don't need, which then ends up untested. This becomes a maintenance nightmare, as a latter developer will call methods that were never "meant" to be used (or were included because someone thought they were a "good" idea). Shooting for 100% coverage forces you to answer the question "why did I write this?"

Matt Jordan
But is comparing unit testing to system testing for test coverage a good analogy? I don't quite think it is.
San Jacinto
The original post didn't specify system versus unit testing. As far as our program is concerned, the coverage is synonymous even if the testing is not - all units are tested to 100% code coverage.
Matt Jordan
@Matt Jordan: Sorry for the confusion about not explicitly mentioning 'unit testing' in the question. I hope it was implied.I agree with @San Jacinto: First, 100% coverage does not mean you've covered all the real-world scenarios. That is an NP-Complete problem (remember?). Second, there's code you have to write but is not tested. Default Constructors are not covered, for instance, where testing frameworks use overloaded constructors. Facade layers are not covered. That's the idea for their existence. Lastly, the answer to your question in point (3) is budget: time and money.
@Pita.O - you asked for real world situations. I gave you one - I guarantee that this program does shoot for 100% coverage. Do we always hit it? Of course not - for many of the reasons you listed. The mindset (and the Software Development Process) is still 100% coverage however, which is what my points were illustrating - if you aren't testing all of your code, you either didn't need to write it, or there must be a valid technical reason why you couldn't hit it. As far as budget - sometimes that is subordinate to safety and external auditing forces.
Matt Jordan
@Matt: I don't think our positions are different on the philosophy. Afterall, you just said, " Do we always hit it? Of course not ...". The problem is that while I am speaking in the context of a "specifications document" language, you are speaking from the general peer-motivational posture, which is great. But in my context, if you promise it, you must deliver it to the last letter. My quest is to put a number on paper that is truly attainable without breaking a conscientious developer's back.