When, if ever, is "number of lines of code" a useful metric?

tags:

metrics

views:

1905

answers:

+12 Q:

When, if ever, is "number of lines of code" a useful metric?

Some people claim that code's worst enemy is its size, and I tend to agree. Yet every day you keep hearing things like

I write blah lines of code in a day.
I own x lines of code.
Windows is x million lines of code.

Question: When is "#lines of code" useful?

ps: Note that when such statements are made, the tone is "more is better".

+55 A:

I'd say it's when you're removing code to make the project run better.

Saying you removed "X number of lines" is impressive. And far more helpful than you added lines of code.

warren 2008-10-08 18:14:48

int i = 0;double d = 0;changed toint i = 0; double d = 0;There, reduced by 1 LOC. Means nothing though does it? ;)

Kent Boogaart 2008-10-08 18:34:09

Bah! Swallowed my carriage returns. I just put two declarations into one line.

Kent Boogaart 2008-10-08 18:34:58

This. For example, refactoring some legacy code for a client, I was able to cut the number of lines in their app's main form in half and that includes adding comment blocks to the refactored methods. I guess that also counts as bragging.

Rob Allen 2008-10-08 18:35:31

I agree with @[Rob Allen]; first comment is argumentative. Though poster could have made answer more clear by mentioning the word "refactor".

Lucas Oman 2008-10-08 18:42:22

You should always strive towards having a less-than-zero code ratio, although this may not always be possible.

JesperE 2008-10-09 06:33:41

This answer is obviously better than the sum of all the other answers on this page. Thanks for using boldface.

dlamblin 2008-10-09 19:24:30

Number of lines removed seems only marginally more useful than lines added. It is still trivial to manipulate and therefore is not a very useful metric.

Eli 2008-12-04 13:39:33

I have to agree with Eli. If going strictly by removing, you end up with code that looks like a Perl guru wrote it. It may even make the project run better, but you just sacrificed a bit of speed for a LOT of later development time / headaches if something needs changing.

Groxx 2009-08-05 03:13:23

But saying "I deleted all my code this morning", isn't quite as impressive. Sometimes you don't need to remove code to make it more impressive, sometimes you just need to make it more readable.

BenAlabaster 2009-10-09 15:41:12

Plus, I'd rather own less lines of good quality code than more lines of awfully written spaghetti. Who wants to maintain 1,000,000 lines of code? I certainly don't.

BenAlabaster 2009-10-09 15:43:33

+3 A:

Answer: when you can talk about negative lines of code. As in: "I removed 40 extraneous lines of code today, and the program is still functioning as well as before."

David Hill 2008-10-08 18:15:19

When you are refactoring a code base and can show that you removed lines of code, and all the regression tests still passed.

Rob Walker 2008-10-08 18:15:32

+4 A:

It's a metric of productivity, as well as complexity. Like all metrics, it needs to be evaluated with care. A single metric usually is not sufficient for a complete answer.

IE, a 500 line program is not nearly as complex as a 5000 line. Now you have to ask other questions to get a better view of the program...but now you have a metric.

Paul Nathan 2008-10-08 18:15:40

I would call this into question. There's plenty of ways to code, for example, in Python, where you can fit at least five different lines of code into one line of code. There's also differences between whether you need to build your own function or use pre-existing stuff. It really is subjective.

Robert Elwell 2008-10-08 18:17:18

I agree with Robert. A 5000 line program may just be a very badly written 500 line program. I've seen plenty of examples of this.

Graeme Perrow 2008-10-08 18:25:56

Of course it's subjective, but it is a metric, which, by their nature, are a 1-dimensional representation.

Paul Nathan 2008-10-08 18:40:21

+2 A:

It's a great metric for scaring/impressing people. That's about it, and definitely the context I'm seeing in all three of those examples.

Robert Elwell 2008-10-08 18:15:48

+12 A:

When bragging to friends.

antik 2008-10-08 18:16:39

If you're bragging about lines of code with your friends, you need to get out more. There's far more amusing things to brag about than your code-base, haha :D

BenAlabaster 2009-10-09 15:46:24

I have found it useful under two conditions:

Gauging my own productivity on my own new project when it's heads down coding time.
When working with a large company and speaking with a manager that really only understands widgets per day.

Chris Lively 2008-10-08 18:18:41

First of all, I would exclude generated code and add the code of the generator input and the generator itself.

I would then say (with some irony), that every line of code may contain a bug and needs to be maintained. To maintain more code you need more developers. In that sense more code generates more employment.

I would like to exclude unit tests from the statement above, as less unit tests do generally not improve maintainability :)

extraneon 2008-10-08 18:19:48

+3 A:

I'd agree that taking the total number of lines of code in a project is one way to measure complexity.

It's certainly not the only measure of complexity. For example debugging a 100 line obfuscated Perl script is much different from debugging a 5,000 line Java project with comment templates.

But without looking at the source, you'd usually think more lines of code is more complex, just as you might think a 10MB source tarball is more complex than a 15kb source tarball.

Adam Bellaire 2008-10-08 18:20:38

+3 A:

It is useful in many ways.

I don't remember the exact # but Microsoft had a web cast that talked about for every X lines of code on average there are y number of bugs. You can take that statement and use it to give a baseline for several things.

How well a code reviewer is doing their job.
judging skill level of 2 employees by comparing their bug ratio's over several projects.

Another thing we look at is, why is it so many lines? Often times when a new programmer is put in a jam they will just copy and paste chunks of code instead of creating functions and encapsulating.

I think that the I wrote x lines of code in a day is a terrible measure. It take no account for difficulty of problem, language your writing in, and so on.

J.J. 2008-10-08 18:23:12

The statistic was published in the Software Engineering Institute's Process Maturity Profile of the Software Community: 1998 Year End Update. A survey of about 800 software development teams (or shops, I don't remember) led to a finding that there are, on average, 12 defects per 1000 lines of code.

Thomas Owens 2009-10-09 15:45:45

+1 A:

Lines of code isn't so useful really, and if it is used as a metric by management it leads to programmers doing a lot of refactoring to boost their scores. In addition poor algorithms aren't replaced by neat short algorithms because that leads to negative LOC count which counts against you. To be honest, just don't work for a company that uses LOC/d as a productivity metric, because the management clearly doesn't have any clue about software development and thus you'll always be on the back foot from day one.

JeeBee 2008-10-08 18:23:20

+2 A:

It seems to me that there's a finite limit of how many lines of code I can refer to off the top of my head from any given project. The limit is probably very similar for the average programmer. Therefore, if you know your project has 2 million lines of code, and your programmers can be expected to be able to understand whether or not a bug is related to the 5K lines of code they know well, then you know you need to hire 400 programmers for your code base to be well covered from someone's memory.

This will also make you think twice about growing your code base too fast and might get you thinking about refactoring it to make it more understandable.

Note I made up these numbers.

dlamblin 2008-10-08 18:24:31

The number of codes added for a given task largely depends on who is writing the code. It shouldn't be used as a measure of productivity. A given individual can produce 1000 lines of redundant and convoluted crap while the same problem could be solved by another individual in 10 concise lines of code. When trying to use LOC added as a metric, the "who" factor should also be taken into account.

An actually useful metric would be "the number of defects found against number of lines added". That would give you an indication of the coding and test coverage capabilities of a given team or individual.

As others have also pointed out, LOC removed has better bragging rights than LOC added :)

Ates Goral 2008-10-08 18:27:45

When pointing out why the change is going to take so damn long.

"Windows is 7 million lines of code and it takes a while to test out all the dependencies..."

Schnapple 2008-10-08 18:29:58

windows *was* 7 million maybe 15 years ago. Now it's most likely 10 times more.

lubos hasko 2010-01-19 07:33:47

+2 A:

There are a lot of different Software Metrics. Lines of code is the most used and is the easiest to understand.

I am surprised how often the lines of code metric correlates with the other metrics. In stead of buying a tool that can calculate cyclomatic complexity to discover code smells, I just look for the methods with many lines, and they tend to have high complexity as well.

A good example of use of lines of code is in the metric: Bugs per lines of code. It can give you a gut feel of how many bugs you should expect to find in your project. In my organization we are usually around 20 bugs per 1000 lines of code. This means that if we are ready to ship a product that has 100,000 lines of code, and our bug database shows that we have found 50 bugs, then we should probably do some more testing. If we have 20 bugs per 1000 lines of code, then we are probably approaching the quality that we usually are at.

A bad example of use is to measure developer productivity. If you measure developer productivity by lines of code, then people tend to use more lines to deliver less.

Hallgrim 2008-10-08 18:31:09

+1 A:

In competitions.

Prog 2008-10-08 18:34:09

Ates Goral 2008-12-18 16:53:50

When the coder doesn't know you are counting lines of code, and so has no reason to deliberately add redundant code to game the system. And when everyone in the team has a similar coding style (so there is a known average "value" per line.) And only if you don't have a better measure available.

finnw 2008-10-08 18:38:26

This is mostly an add to the already volumnous commentary.. But basically, lines of code (or perhaps totalCharacterCount/60) indicates the size of the monster. As a few people have said, that gives a clue to a codebase's complexity. It's level of complexity has a lot of impact. Partially it has impact on how difficult it is to comprehend the system and make a change.

That's why people want less lines of code. In theory, less lines of code is less complex, and there is less room for error. I'm not sure that knowing that upfront is terribly useful for anything other than estimation, and planning.

For example: Supposed I have a project and on cursory examination I realize that the matter will involve modifying as many as 1000 lines of code within an application that has 10,000 lines. I know that this project is likely to take longer to implement, be less stable, and take longer to debug and test.

It's also extremely useful for understanding the scope of change between two builds. I wrote a little program that will analyze the scope of change between any two SVN revisions. It will look at a unified diff, and from it, figure out how many lines were added, removed, or changed. This helps me know what to expect in the testing and QA that follows a new build. Basically, bigger numbers of change mean that we need to watch that build closer, put it through full regression testing, etc..

Troy Howard 2008-10-08 18:44:54

+6 A:

It's useful when loading up your line printer, so that you know how many pages the code listing you're about to print will consume. ;)

Troy Howard 2008-10-08 18:48:07

+16 A:

It's a terrible metric, but as other people have noted, it gives you a (very) rough idea of the overall complexity of a system. If you're comparing two projects, A and B, and A is 10,000 lines of code, and B is 20,000, that doesn't tell you much - project B could be excessively verbose, or A could be super-compressed.

On the other hand, if one project is 10,000 lines of code, and the other is 1,000,000 lines, the second project is significantly more complex, in general.

The problems with this metric come in when it's used to evaluate productivity or level of contribution to some project. If programmer "X" writes 2x the number of lines as programmer 'Y", he might or might not be contributing more - maybe "Y" is working on a harder problem...

Mark Bessey 2008-10-08 18:53:03

Even more than a harder problem, "Y" might be writing better code for the SAME problem that is a lot more DRY and maintainable.

TM 2009-10-30 13:22:05

+11 A:

I'm surprised nobody has mentioned Dijkstra's famous quote yet, so here goes:

My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

The quote is from an article called "On the cruelty of really teaching computing science".

JesperE 2008-10-08 19:03:37

I heard that Microsoft used to fire 5% of people every 6 months, I always imagined it would be based on lines of code written, which is why Windows is so bulky, slow and inefficient ;). Lines of code is a useful metric for measuring the complexity of an application in terms of rough ordering, ie a beginners program in Basic might be 10 lines of code, 100 lines of code is a toy application, 50000 lines is reasonable size application, 10 million lines of code is a monstrosity called Windows.

Lines of code is not a very useful metric though, I used to write games in assembly language (68000 mainly) they would measure in at around 50k lines of code, but I kept the number of lines of code down by not pushing registers to the stack and keeping track of what was contained in the registers to cut down on code size (other programmers I knew did a push multiple of d0-d7,a0-a6 to the stack, which obviously slows down the code, but simplifies keeping track of what is affected).

2008-10-08 19:07:45

+1 A:

Check out wikipedia's definition: http://en.wikipedia.org/wiki/Source_lines_of_code

SLOC = 'source lines of code'

There is actually quite a bit of time put into these metrics where I work. There are also different ways to count SLOC.

From the wikipedia article:

There are two major types of SLOC measures: physical SLOC and logical SLOC.

Another good resource: http://www.dwheeler.com/sloc/

marked 2008-10-08 19:09:36

It can be a very good measure of complexity for the purposes of risk assessment - the more lines changed the greater the chance of a bug being introduced.

Ray 2008-10-08 19:10:16

+2 A:

like most metrics, they mean very little without a context. So the short answer is: never (except for the line printer, that's funny! Who prints out programs these days?)

An example:

Imagine that you're unit-testing and refactoring legacy code. It starts out with 50,000 lines of code (50 KLOC) and 1,000 demonstrable bugs (failed unit tests). The ratio is 1K/50KLOC = 1 bug per 50 lines of code. Clearly this is terrible code!

Now, several iterations later, you have reduced the known bugs by half (and the unknown bugs by more than that most likely) and the code base by a factor of five through exemplary refactoring. The ratio is now 500/10000 = 1 bug per 20 lines of code. Which is apparently even worse!

Depending on what impression you want to make, this can be presented as one or more of the following:

50% less bugs
five times less code
80% less code
60% worsening of the bugs-to-code ratio

all of these are true (assuming i didn't screw up the math), and they all suck at summarizing the vast improvement that such a refactoring effort must have achieved.

Steven A. Lowe 2008-10-08 19:10:45

They can be helpful to indicate the magnitude of an application - says nothing about quality! My point here is just that if you indicate you worked on an application with 1,000 lines and they have an application that is 500k lines (roughly), a potential employer can understand if you have large-system experience vs. small utility programming.

I fully agree with warren that the number of lines of code you remove from a system is more useful than the lines you add.

agartzke 2008-10-08 19:51:03

+1 A:

Lines of code are useful to know when you're wondering if a code file is getting too large. Hmmm...This file is now 5000 lines of code. Maybe I should refactor this.

DarthNoodles 2008-10-08 20:02:54

Lines of code counts are useful when pitching the extensiveness of your comprehensive product to a customer who considers lines of code to be a general indicator of product size. For example, when you're trying to convince someone your product handles many corner cases, or when you're trying to get into a beta for a development tool where the tool vendor wants to get maximum code coverage for testing purposes.

Nick 2008-10-08 20:42:47

Functionally never, aside from the previously-mentioned "bragging" purpose.

Lines != effectiveness. Often the relationship is inverse, in my experience (though not strictly, especially for the extreme, for obvious reasons)

Groxx 2008-10-09 04:09:50

First coming, and I have the same question.

Kevin Yu 2008-12-04 13:22:28

When you have to budget for the number of punch cards you need to order.

Dave Markle 2008-12-04 13:35:20

+1 A:

I wrote 2 blog post detailling the pro and cons of counting Lines of Code (LoC):

How do you count your number of Lines Of Code (LOC) ? : The idea is to explain that you need to count the logical number of lines of code instead of a physical count. To do so you can use tools like NDepend for example.

Why is it useful to count the number of Lines Of Code (LOC) ?: The idea is that LoC should never be used to measure productivity, but more to do test coverage estimation and software deadline estimation.

Patrick Smacchia - NDepend dev 2008-12-18 16:44:38

The lines of code is dependent upon the language.

For example 1 line of C code is worth an average of x lines of ASM code. 1 line of C++ -> C etc....

Java and C# encapsulates quite a bit of lines of code due to the background support from the VM.

monksy 2009-10-09 15:36:50

+1 A:

The Software Engineering Institute's Process Maturity Profile of the Software Community: 1998 Year End Update (which I could not find a link to, unfortunately) discusses a survey of around 800 software development teams (or perhaps it was shops). The average defect density was 12 defects per 1000 LOC.

If you had an application with 0 defects (it doesn't exist in reality, but let's suppose) and wrote 1000 LOC, on average, you can assume that you just introduced 12 defects into the system. If QA finds 1 or 2 defects and that's it, then they need to do more testing as there are probably 10+ more defects.

Thomas Owens 2009-10-09 15:44:11

This is used so often during sales presentations. For instance, KLoC (Kilo Lines of Code) or LoC is used to demonstrate the kind of competence the vendor organization has with large/complex systems. This is especially true when the vendor is attempting to showcase their ability to MAINTAIN complex legacy systems. As part of negotiation, sometimes the customer organization provides a representative chunk of code to execute a Proof of Concept with the vendor to test the vendor's capability.This representative code will have enough complexities for the vendor company to handle and its sales pitch about "maintaining systems with several million LoC" can come under the radar.

So, yes, Lines of Code is used and abused during sales presentations and hence a useful metric in sales.

manoj balraj 2009-10-29 08:38:03

As most people have already stated, it can be an ambiguous metric, especially if you are comparing people coding in different languages.

5,000 lines of Lisp != 5,000 lines of C

Jacinda S 2010-04-28 00:44:22

The number of LOC is useful when calculating the defect rate (bugs per 1,000 LOC, etc.)

gw 2010-09-13 07:55:25

At least, not for progress:

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” --Bill Gates

Pascal Thivent 2010-09-13 08:09:41

ansaurus

tags:

views:

answers:

When, if ever, is "number of lines of code" a useful metric?

related questions