views:

686

answers:

14

Code coverage is propably the most controversial code metric. Some say, you have to reach 80% code coverage, other say, it's superficial and does not say anything about your testing quality. (See Jon Limjap's good answer on "What is a reasonable code coverage % for unit tests (and why)?".)

People tend to measure everything. They need comparisons, benchmarks etc.
Project teams need a pointer, how good their testing is.

So what are alternatives to code coverage? What would be a good metric that says more than "I touched this line of code"?
Are there real alternatives?

+2  A: 

Bug metrics are also important:

  • Number of bugs coming in
  • Number of bugs resolved

To detect for instance if bugs are not resolved as fast as new come in.

Stefan Steinegger
You might add new bugs introduced as a result of fix :)
shahkalpesh
Proper terminology is defect, see: http://stackoverflow.com/questions/384423/bug-er-defect-terminology
StuperUser
+5  A: 

Crap4j is one fairly good metrics that I'm aware of...

Its a Java implementation of the Change Risk Analysis and Predictions software metric which combines cyclomatic complexity and code coverage from automated tests.

mezoid
Funny name, but very nice tool. I'll try it.
furtelwart
+2  A: 

Code Coverage is just an indicator and helps pointing out lines which are not executed at all in your tests, which is quite interesting. If you reach 80% code coverage or so, it starts making sense to look at the remaining 20% of lines to identify if you are missing some use case. If you see "aha, this line gets executed if I pass an empty vector" then you can actually write a test which passes an empty vector.

As an alternative I can think of, if you have a specs document with Use Cases and Functional Requirements, you should map the unit tests to them and see how many UC are covered by FR (of course it should be 100%) and how many FR are covered by UT (again, it should be 100%).

If you don't have specs, who cares? Anything that happens will be ok :)

Daniel Daranas
If the tests are your spec, then the UC/FR coverage should automatically be 100%.
Esko Luontola
@Esko Luontola: Yes.
Daniel Daranas
+1  A: 

How about (lines of code)/(number of test cases)? Not extremely meaningful (since it depends on LOC), but at least it's easy to calculate.

Another one could be (number of test cases)/(number of methods).

ammoQ
The 2nd is good, I considered it, too.
furtelwart
(lines of production code)/(lines of test code) would be a similar measure. In a project written with TDD the amount of test code is usually about the same as production code (maybe 20% more or less).
Esko Luontola
+11  A: 

If you are looking for some useful metrics that tell you about the quality (or lack there of) of your code, you should look into the following metrics:

  1. Cyclomatic Complexity
    • This is a measure of how complex a method is.
    • Usually 10 and lower is good, 11-25 is poor, higher is terrible.
  2. Nesting Depth
    • This is a measure of how many nested scopes are in a method.
    • Usually 4 and lower is good, 5-8 is poor, higher is terrible.
  3. Relational Cohesion
    • This is a measure of how well related the types in a package or assembly are.
    • Relational cohesion is somewhat of a relative metric, but useful none the less.
    • Acceptable levels depends on the formula. Given the following:
      • R: number of relationships in package/assembly
      • N: number of types in package/assembly
      • H: Cohesion of relationship between types
    • Formula: H = (R+1)/N
    • Given the above formula, acceptable range is 1.5 - 4.0
  4. Lack of Cohesion of Methods (LCOM)
    • This is a measure of how cohesive a class is.
    • Cohesion of a class is a measure of how many fields each method references.
    • Good indication of whether your class meets the Principal of Single Responsibility.
    • Formula: LCOM = 1 - (sum(MF)/M*F)
      • M: number of methods in class
      • F: number of instance fields in class
      • MF: number of methods in class accessing a particular instance field
      • sum(MF): the sum of MF over all instance fields
    • A class that is totally cohesive will have an LCOM of 0.
    • A class that is completely non-cohesive will have an LCOM of 1.
    • The closer to 0 you approach, the more cohesive, and maintainable, your class.

These are just some of the key metrics that NDepend, a .NET metrics and dependency mapping utility, can provide for you. I recently did a lot of work with code metrics, and these 4 metrics are the core key metrics that we have found to be most useful. NDepend offers several other useful metrics, however, including Efferent & Afferent coupling and Abstractness & Instability, which combined provide a good measure of how maintainable your code will be (and whether or not your in what NDepend calls the Zone of Pain or the Zone of Uselessness.)

Even if you are not working with the .NET platform, I recommend taking a look at the NDepend metrics page. There is a lot of useful information there that you might be able to use to calculate these metrics on whatever platform you develop on.

jrista
+1 I gave my answer just before I left work...and I thought of NDepend on my way home...its definitely an excellent tool.
mezoid
A: 

SQLite is an extremely well-tested library, and you can extract all kinds of metrics from it.

As of version 3.6.14 (all statistics in the report are against that release of SQLite), the SQLite library consists of approximately 63.2 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 715 times as much test code and test scripts - 45261.5 KSLOC.

In the end, what always strikes me as the most significant is none of those possible metrics seem to be as important as the simple statement, "it meets all the requirements." (So don't lose sight of that goal in the process of achieving it.)

If you want something to judge a team's progress, then you have to lay down individual requirements. This gives you something to point to and say "this one's done, this one isn't". It's not linear (solving each requirement will require varying work), and the only way you can linearize it is if the problem has already been solved elsewhere (and thus you can quantize work per requirement).

Roger Pate
+1  A: 

As a rule of thumb, defect injection rates proportionally trail code yield and they both typically follow a Rayleigh distribution curve.
At some point your defect detection rate will peak and then start to diminish.
This apex represents 40% of discovered defects.
Moving forward with simple regression analysis you can estimate how many defects remain in your product at any point following the peak.
This is one component of Lawrence Putnam's model.

This might be an interesting answer, but I simply don't understand it. Could you please rephrase it to increase understandability?
furtelwart
StackOverflow only allows 600 characters and I cannot elaborate on this with only 600 characters, so I explained the answer in a blog post. Please see http://redrockresearch.org/?p=58 for a better explanation of this answer.
Thanks for the blog entry, it makes it clearer. But where does the 40% figure come from? By the way: You can edit your answer to add additional information.
furtelwart
+2  A: 

What about watching the trend of code coverage during your project?

As it is the case with many other metrics a single number does not say very much.

For example it is hard to tell wether there is a problem if "we have a Checkstyle rules compliance of 78.765432%". If yesterday's compliance was 100%, we are definitely in trouble. If it was 50% yesterday, we are probably doing a good job.

I alway get nervous when code coverage has gotten lower and lower over time. There are cases when this is okay, so you cannot turn off your head when looking at charts and numbers.

BTW, sonar (http://sonar.codehaus.org/) is a great tool for watching trends.

jens
+2  A: 

Using code coverage on it's own is mostly pointless, it gives you only insight if you are looking for unnecessary code.

Using it together with unit-tests and aiming for 100% coverage will tell you that all the 'tested' parts (assumed it was all successfully too) work as specified in the unit-test.

Writing unit-tests from a technical design/functional design, having 100% coverage and 100% successful tests will tell you that the program is working like described in the documentation.

Now the only thing you need is good documentation, especially the functional design, a programmer should not write that unless (s)he is an expert of that specific field.

Martin P. Hellwig
100% successful unit testing does not actually tell you the program in working like described. Most unit tests are done at the object or functional level. Each can be working as desired and yet the combination of them (which isn't unit tested) can still be wrong.
Steve Rowe
Please reread, 100% coverage with 100% unit-test will tell you that the program works like described with the unit-tests
Martin P. Hellwig
100% coverage with 100% unit-test only means that 100% of the code was executed during testing. Whether the tests are effective at testing the behavior is a different matter altogether.
CoverosGene
A: 

I like revenue, sales numbers, profit. They are pretty good metrics of a code base.

Simeon Pilgrim
+1  A: 

Scenario coverage.

I don't think you really want to have 100% code coverage. Testing say, simple getters and setters looks like a waste of time.

The code always runs in some context, so you may list as many scenarios as you can (depending on the problem complexity sometimes even all of them) and test them.

Example:

// parses a line from .ini configuration file
// e.g. in the form of name=value1,value2
List parseConfig(string setting)
{
    (name, values) = split_string_to_name_and_values(setting, '=')
    values_list = split_values(values, ',')
    return values_list
}

Now, you have many scenarios to test. Some of them:

  • Passing correct value

  • List item

  • Passing null

  • Passing empty string

  • Passing ill-formated parameter

  • Passing string with with leading or ending comma e.g. name=value1, or name=,value2

Running just first test may give you (depending on the code) 100% code coverage. But you haven't considered all the posibilities, so that metric by itself doesn't tell you much.

ya23
I think you mean sth. like "Use case coverage". But that's no metric that can't be calculated, only if you define each and every scenario for each function.
furtelwart
+1  A: 

I wrote a blog post about why High Test Coverage Ratio is a Good Thing Anyway.

I agree that: when a portion of code is executed by tests, it doesn’t mean that the validity of the results produced by this portion of code is verified by tests.

But still, if you are heavily using contracts to check states validity during tests execution, high test coverage will mean a lot of verification anyway.

Patrick Smacchia - NDepend dev
A: 

The value in code coverage is it gives you some idea of what has been exercised by tests. The phrase "code coverage" is often used to mean statement coverage, e.g., "how much of my code (in lines) has been executed", but in fact there are over a hundred varieties of "coverage". These other versions of coverage try to provide a more sophisticated view what it means to exercise code.

For example, condition coverage measures how many of the separate elements of conditional expressions have been exercised. This is different than statement coverage. MC/DC "modified condition/decision coverage" determines whether the elements of all conditional expressions have all been demonstrated to control the outcome of the conditional, and is required by the FAA for aircraft software. Path coverage meaures how many of the possible execution paths through your code have been exercised. This is a better measure than statement coverage, in that paths essentially represent different cases in the code. Which of these measures is best to use depends on how concerned you are about the effectiveness of your tests.

Wikipedia discusses many variations of test coverage reasonably well. http://en.wikipedia.org/wiki/Code_coverage

Ira Baxter
+1  A: 

This hasn't been mentioned, but the amount of change in a given file of code or method (by looking at version control history) is interesting particularly when you're building up a test suite for poorly tested code. Focus your testing on the parts of the code you change a lot. Leave the ones you don't for later.

Watch out for a reversal of cause and effect. You might avoid changing untested code and you might tend to change tested code more.

Schwern