views:

517

answers:

8

While LOC (# lines of code) is a problematic measurement of a code's complexity, it is the most popular one, and when used very carefully, can provide a rough estimate of at least relative complexities of code bases (i.e. if one program is 10KLOC and another is 100KLOC, written in the same language, by teams of roughly the same competence, the second program is almost certainly much more complex).

When counting lines of code, do you prefer to count comments in ? What about tests?

I've seen various approaches to this. Tools like cloc and sloccount allow to either include or exclude comments. Other people consider comments part of the code and its complexity.

The same dilemma exists for unit tests, that can sometimes reach the size of the tested code itself, and even exceed it.

I've seen approaches all over the spectrum, from counting only "operational" non-comment non-blank lines, to "XXX lines of tested, commented code", which is more like running "wc -l on all code files in the project".

What is your personal preference, and why?

+2  A: 

I personally don't feel that the LOC metric on its own is as useful as some of the other code metrics.

NDepend will give you the LOC metric but will also give you many others, such cyclometric complexity. Rather than list them all, here's the link to the list.

There is also a free CodeMetric add-in for Reflector

Mitch Wheat
+1  A: 

I'm not going to directly answer your question for a simple reason: I hate the lines of code metric. No matter what you're trying to measure it's very hard to do worse than LOC; Pretty much any other metric you care to think of is going to be better.

In particular, you seem to want measure the complexity of your code. Overall cyclometric complexity (also called McCabe's complexity) is much better metric for this.

Routines with a high cyclometric complexity are the routines you want to focus your attention on. It's these routines that are difficult to test, rotten to the core with bugs and hard to maintain.

There are many tools that measure this sort of complexity. A quick Google search on your favourite language will find dozens of tools that do this sort of complexity.

Simon Johnson
Agree with the sentiment, but IMO he was not asking for that...
Treb
+1  A: 

Lines of Code means exactly that: No comments or empty lines are counted. And in order for it to be comparable to other source code (no matter if the metric in itsle fis helpful or not), you need at least similar coding styles:

for (int i = 0; i < list.count; i++)
{
    // do some stuff
}

for (int i = 0; i < list.count; i++){
    // do some stuff
}

The second version does exactly the same, but has one LOC less. When you have a lot of nested loops, this can sum up quite a bit. Which is why metrics like function points were invented.

Treb
+9  A: 

A wise man once told me 'you get what you measure' when it comes to managing programmers.

If you rate them in their LOC output amazingly you tend to get a lot of lines of code.

If you rate them on the number of bugs they close out, amazingly you get a lot of bugs fixed.

If you rate them on features added, you get a lot of features.

If you rate them on cyclomatic complexity you get ridiculously simple functions.

Since one of the major problems with code bases these days is how quickly they grow and how hard they are to change once they've grown, I tend to shy away from using LOC as a metric at all, because it drives the wrong fundamental behavior.

That said, if you have to use it, count sans comments and tests and require a consistent coding style.

But if you really want a measure of 'code size' just tar.gz the code base. It tends to serve as a better rough estimate of 'content' than counting lines which is susceptible to different programming styles.

Edward Kmett
I wonder what the conclusion will be, if You would add"If You rate them by customer satisfaction, ..." to the list.
Black
If you rate them by customer satisfaction you get very large expense reports. ;)
Edward Kmett
+3  A: 
Bill the Lizard
+1 for maintenance reasoning. I never use LoC as a metric myself, but every month or two management asks for project LoC counts. Including comments and tests also has the advantage of simplifying my count process to a one-line command in powershell. :)
Greg D
A: 

Depends on what you are using the LOC for.

As a complexity measure - not so much. Maybe the 100KLOC are mostly code generated from a simple table, and the 10KLOC kas 5KLOC regexps.

However, I see every line of code associated with a running cost. You pay for every line as long as the program lives: it needs to be read when maintained, it might contain an error that needs to be fixed, it increases compile time, get-from-source-control and backup times, before you change or remove it you may need to find out if anyone relies on it etc. The average cost may be nanopennies per line and day, but it's stuff that adds up.

KLOC can be a first shot indicator of how much infrastructure a project needs. In that case, I would include comments and tests - even though the running cost of a comment line is much lower than one of the regexp's in the second project.

[edit] [someone with a similar opinion about code size][1]

peterchen
A: 

We only use a lines of code metric for one thing - a function should contain few enough lines of code to be read without scrolling the screen. Functions bigger than that are usually hard to read, even if they have a very low cyclometric complexity. For his use we do count whitespace and comments.

It can also be nice to see how many lines of code you've removed during a refactor - here you only want to count actual lines of code, whitespace that doesn't aid readability and comments that aren't useful (which can't be automated).

Finally a disclaimer - use metrics intelligently. A good use of metrics is to help answer the question 'which part of the code would benefit most from refactoring' or 'how urgent is a code review for the latest checkin?' - a 1000 line function with a cyclomatic complexity of 50 is a flashing neon sign saying 'refactor me now'. A bad use of metrics is 'how productive is programmer X' or 'How complicated is my software'.

Joe Gauterin
A: 

Excerpt from the article: How do you count your number of Lines Of Code (LOC) ? relative to the tool NDepend that counts the logical numbers of lines of code for .NET programs.


How do you count your number of Lines Of Code (LOC) ?

Do you count method signature declaration? Do you count lines with only bracket? Do you count several lines when a single method call is written on several lines because of a high number of parameters? Do you count ‘namespaces’ and ‘using namespace’ declaration? Do you count interface and abstract methods declaration? Do you count fields assignment when they are declared? Do you count blank line?

Depending on the coding style of each of developer and depending on the language choose (C#, VB.NET…) there can be significant difference by measuring the LOC.

Apparently measuring the LOC from parsing source files looks like a complex subject. Thanks to an astute there exists a simple way to measure exactly what is called the logical LOC. The logical LOC has 2 significant advantages over the physical LOC (the LOC that is inferred from parsing source files):

  • Coding style doesn’t interfere with logical LOC. For example the LOC won’t change because a method call is spawn on several lines because of a high number of arguments.
  • Logical LOC is independent from the language. Values obtained from assemblies written with different languages are comparable and can be summed.

In the .NET world, the logical LOC can be computed from the PDB files, the files that are used by the debugger to link the IL code with the source code. The tool NDepend computes the logical LOC for a method this way: it is equals to the number of sequence point found for a method in the PDB file. A sequence point is used to mark a spot in the IL code that corresponds to a specific location in the original source. More info about sequence points here. Notice that sequence points which correspond to C# braces‘{‘ and ‘}’ are not taken account.

Obviously, the LOC for a type is the sum of its methods’ LOC, the LOC for a namespace is the sum of its types’ LOC, the LOC for an assembly is the sum of its namespaces’ LOC and the LOC for an application is the sum of its assemblies LOC. Here are some observations:

  • Interfaces, abstract methods and enumerations have a LOC equals to 0. Only concrete code that is effectively executed is considered when computing LOC.
  • Namespaces, types, fields and methods declarations are not considered as line of code because they don’t have corresponding sequence points.
  • When the C# or VB.NET compiler faces an inline instance fields initialization, it generates a sequence point for each of the instance constructor (the same remark applies for inline static fields initialization and static constructor).
  • LOC computed from an anonymous method doesn’t interfere with the LOC of its outer declaring methods.
  • The overall ratio between NbILInstructions and LOC (in C# and VB.NET) is usually around 7.
Patrick Smacchia - NDepend dev