views:

964

answers:

18

I realize there's no definitely "right" answer to this question, but when people talk about lines of code, what do they mean? In C++ for example, do you count blank lines? comments? lines with just an open or close brace?

I know some people use LoC as a productivity measure, and I'm wondering if there is a standard convention here. Also, I think there's a way to get various compilers to count lines of code - is there a standard convention there?

+3  A: 

Whatever "wc -l" returns is my number.

Brian Knoblauch
+8  A: 

I'd say

  • comments count
  • blank lines count, because they're important for readability, but not more than one contiguously
  • lines with braces count too, but apply the same rule as for blank lines - i.e. 5 nested braces with no code between them counts as one line.

I'd also humbly suggest that any productivity measure which actually relies on a LoC value is bunk :)

Alnitak
A: 
  1. LOCphy: physically lines
  2. LOCbl: Blanklines Kommentarblocks werden als Kommentarzeile gezählt
  3. LOCpro: programming lines (declarations, definitions, directives & code)
  4. LOCcom: lines of comments

Many available tools are giving information of percentage of filled lines and so on.

You just have to look at it but do not only count on it.

LOC is growing massively on start of a project and it decreases often after reviews ;)

joki
+17  A: 

No, there is no standard convention, and every tool that counts them will be slightly different.

This may make you ask, "Why then would I ever use LOC as a productivity measure?" and the answer is, because it doesn't really matter how you count a line of code, as long as you count them consistently you can get some idea of the general size of a project in relation to others.

Craig H
A: 

I think of it as a single processable statement. For example

(1 line)

Dim obj as Object

(5 lines)

If _amount > 0 Then
  _amount += 5
Else
  _amount -= 5
End If
StingyJack
+6  A: 

Have a look at the Wikipedia Article, especially the "Measuring SLOC" section:

There are two major types of SLOC measures: physical SLOC and logical SLOC. Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of lines in the text of the program's source code including comment lines. Blank lines are also included unless the lines of code in a section consists of more than 25% blank lines. In this case blank lines in excess of 25% are not counted toward lines of code.

Logical SLOC measures attempt to measure the number of "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like programming languages is the number of statement-terminating semicolons). It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are sensitive to logically irrelevant formatting and style conventions, while logical SLOC is less sensitive to formatting and style conventions. Unfortunately, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different from physical SLOC.

Consider this snippet of C code as an example of the ambiguity encountered when determining SLOC:

for (i=0; i<100; ++i) printf("hello");   /* How many lines of code is this? */

In this example we have:

  • 1 Physical Lines of Code LOC
  • 2 Logical Lines of Code lLOC (for statement and printf statement)
  • 1 Comment Line

[...]

xsl
+2  A: 

"Lines of code" should include anything you have to maintain. That includes comments, but excludes whitespace.

If you're using this as a productivity metric, make sure you're making reasonable comparisons. A line of C++ isn't the same as a line of Ruby.

Bill the Lizard
And as for a line of APL...
Steve Jessop
Egads, I just looked that up on Wikipedia. It's horrible. :)
Bill the Lizard
@Bill, you've never done APL? You haven't lived until you've done it on an IBM Selectric converted to a terminal. Many of the operators required backspacing and overstriking.
Paul Tomblin
I agree about comments. Comments (in my code at least) often contain commented-out lines that I haven't decided to necessarily erase yet and therefore are still part of the work, if not the actual program. Of course it depends what your purpose in measuring it is.
chaiguy
+3  A: 

If you use LOC as a measure of productivity, you will suddenly find your programmers writing much more verbosely to "game the system". It's a stupid measure, and only stupid people use it for anything more than bragging rights.

Paul Tomblin
I particularly like the 25% allowance for blank lines in the Wikipedia defn quote elsewhere. A simple checkin hook will ensure that you always get paid for your full allowance of blank lines ;-)
Steve Jessop
Better to use it for planning and estimating than to use it as a basis for computing programmer pay.
skiphoppy
@onebyone - and if they count comments as "better" than blank lines, a checkin hook to change all your blank lines to empty comments!
Paul Tomblin
Bragging is the whole point, imo. ;) Personally I would only use this to see how much I've written for personal interest, and thus "cheating the system" doesn't apply.
chaiguy
+1  A: 

LOC is a notoriously ambiguous metric. For a detailed comparison, it's only valid when comparing code that's been written in the same language, with the same style, by the same team.

However, it does provide a certain complexity notion when looked at in an order-of-magnitude idea. A 10000-line program is much more complex than a 100-line program.

The advantage of LOC is that wc -l returns it, and there's no real fancyness involved in understanding or calculating it, unlike many other software metrics.

Paul Nathan
+1  A: 

There's no right answer.

For informal estimates, I use wc -l.

If I needed to measure something rigorously, I would measure executable statements. Pretty much, anything with a statement terminator (usually semicolon), or ending with a block. For compound statements, I'd count each substatement.

So:

int i = 7;                  # one statement terminator; one (1) statement
if (r == 9)                # count the if as one (1) statement
  output("Yes");      # one statement terminator; one (1) statement; total (2) for the if
while (n <= 14) {    # count the while as one (1) statement
  output("n = ", n);  # one statement terminator; one (1) statement
  do_something();   # one statement terminator; one (1) statement
  n++                       # count this one, one statement (1), even though it doesn't need a statement terminator in some languages
}                              # brace doesn't count; total (4) for the while

If I were doing it in Scheme or Lisp, I'd count expressions.

As others have said, what matters most is that your count is consistent. It also matters what you're using this for. If you just want to let a potential new hire know how big your project is, use wc -l. If you're wanting to do planning and estimating, then you might want to get more formal. You should not in any circumstances be using LOC to base programmer compensation on.

skiphoppy
+1  A: 

You should be thinking of "lines of code spent", not "lines of code produced".

Things should be as simple as possible, so creating a positive benchmark based on quantity of lines is encouraging bad code.

Furthermore, some things that are very difficult end up being solved with very little code, and some things that are very easy (boilerplate code like getters and setters for example) can add a lot of lines in very little time.

As for the original question, if I was going to count lines, I'd include every line other than consecutive blank lines. I'd include comments as well, since they are (hopefully) useful documentation.

TM
A: 

I agree with the posts that say it is reported many ways and isn't an important metric. See this ever-hear-of-developers-getting-paid-per-line-of-code.

kenny
+1  A: 

1 line = 4 seconds of reading. If it takes more than that to figure out what I'm saying on that line, the line's too long.

Ant P.
A: 

I agree w/ the accepted answer by Craig H, however I'd like to add that in school I was taught that white space, comments and declarations shouldn't be counted as "lines of code" in terms of measuring the lines of code produced by a programmer for productivity purposes - i.e. Ol’ “15-lines-per-day” rule.

Booji Boy
+1  A: 

The notion of LOC is a attempt to quantify a volume of code. As pointed out in other answers, it doesn't matter what you specifically call a line of code as long as you are consistent. Intuitively, it seems that a 10 line program smaller than an 100 line program which is smaller than a 1000 line program and so on. You would expect that it takes less time to create, deubg, and maintain a 100 line program than a 1000 line program. Informally at least, you can use LOC to give a rough feel for the amount of work required to create, debug, and maintain a program of a certain size.

Of course, there are places where this doesn't hold up. For example, a complex algorithm rendered in 1000 lines may be much harder to develop than, say, a simple database program that consumes 2500 lines.

So, LOC is a coarse-grained measure of code volume that enables managers to get a reasonable understading of the size of a problem.

mxg
+4  A: 

Any day that I can end with fewer lines of code, but as much or more working functionality... is a good day. Being able to remove hundreds of lines of code and wind up with something that's just as functional, and more maintainable, is a wonderful thing.

That being said, unless you have very strict coding guidelines in your team, physical lines of code is a useless statistic. Logical lines of code is still useless, but as least it's not dangerously misleading.

RHSeeger
A: 

I use wc -l for a quick estimate of the complexity of a workspace. However, as a productivity metric LOC is THE WORST. I generally consider it a very productive day if my if LOC count goes DOWN.

Chris Nava
A: 

I know some people use LoC as a productivity measure

Could you please tell me who they are so I don't accidentally work with (or even worse, for) them?

If I can implement in 1400 lines using Haskell what I could also implement in 2800 lines using C, am I more productive in C or Haskell? Which is going to take longer time? Which is going to have more bugs (hint: it's linear in the LOC count)?

A programmer's worth is how much his code changes (including from or to the empty string) increases the number on your bottom line. I know of no good way to measure or approximate that. But I know that any reasonably measurable metric can be gamed and doesn't reflect what you really want. So don't use it.

That being said, how do you count LOCs? Simple, use wc -l. Why is that the right tool? Well, you probably don't care about any particular number, but about general total trends (going up or down, and by how much), about individual trends (going up or down, changing direction how fast, ...) and about pretty much anything except just the number 82,763.

The differences between what the tools measure are probably not interesting. Unless you have evidence that the number spit out by your tool (and only that tool) correlates with something interesting, use it as a rough ballpark figure; anything other than monotonicity should be taken with not only a grain but a bucketful of salt.

Count how many times '\n' occurs. Other interesting characters to count might be ';', '{' and '/'.

Jonas Kölker