views:

3958

answers:

16

We want to invite a third party for reviewing our code and they want to know a ball park figure of number of lines of code we have in all our applications!

Depending on the coding style of each of developer and depending on the language chosen there can be significant difference by measuring the LOC. I am really interested in knowing, how you counted your number of Lines Of Code (if at all if you ever did)?

Thanks

+31  A: 
Prakash
+6  A: 

I'd be hard pressed to think that "wc -l " on every file in your source tree isn't a good enough estimate of LOC.

EmmEff
The problem there is well documented code seems like more work that code with no comments at all.
Andrew Johnson
Why is that a problem? Well-documented code *is* more work.
Ben Collins
And all generated code does not count towards LOC as well, Code generated by the GUI designer f.e. which often is in the same files as some event driven code does prevent using simple counting tools.
Ralph Rickenbach
Well documented code is more work to produce in the first place, but it should be less work to review and maintain. I guess it depend on what you want the numbers for, but you can't really measure productivity through LOC.
Andrew Johnson
+4  A: 

Also check this discussion as well

Prakash
A: 

If it were perl -- I would use the Perl::tidy module because it doesn't count lines of code but rather lines of command. Or rather, it doesn't actually count but it would be very easy to add in code to allow it to count.

In other languages I would find a way to get that number from the interpreter/compiler or find one of many free tokenizers out there for source. Look for modules that help clean or tidy your code because they have built in tokenizers for each command and then it's a matter of adding a counter.

+4  A: 

I've used cloc a number of times to do this sort of thing. It's quick and dirty and it works for a large number of languages/file types, giving you a breakdown of how many lines there are in which languages.

Paul Wicks
cloc is pretty cool. Thanks!
Forgotten Semicolon
A: 

I would recommend you implement a Best Practice in code formatting. Then implement an application to format the code automatically (this works best with a continuous integration system). This way you don't have to worry if developers have their own style of coding since it will be automatically formatted once they check it into source control.

As far as how to get the line count across various languages, the easiest way is "wc -l" Granted their are other tools out there for counting lines of code (CLOC, SLOCCount), but that will work across all languages.

pdavis
+2  A: 

Visual Studio Add-In

CrashTECH
+16  A: 

There's a perl program called CLOC that you can use. It is also available as a windows binary:

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. ... cloc is known to run on many flavors of Linux, AIX, Solaris, IRIX, z/OS, and Windows. (To run the Perl source version of cloc on Windows one needs ActiveState Perl 5.6.1 or higher, or Cygwin installed. Alternatively one can use the Windows binary of cloc generated with perl2exe to run on Windows computers that have neither Perl nor Cygwin.)

It can produce a lot of statistics, depending on your code base, but most people will use less languages that their example:

Unix> cloc --sum-reports --report_file=script_lang perl-5.8.8.txt python-2.4.2.txt
Wrote script_lang.lang
Wrote script_lang.file

Unix> cat script_lang.lang
http://cloc.sourceforge.net v 0.72
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
C                   409     46920     35958    383652 x   0.77 =      295412.04
Python             1605     55998     31886    309549 x   4.20 =     1300105.80
Perl               1576     74568     89136    220919 x   4.00 =      883676.00
C/C++ Header        280     12169     26366     88089 x   1.00 =       88089.00
Bourne Shell        146      5201      7428     52115 x   3.81 =      198558.15
Lisp                  4      1120      2291      9799 x   1.25 =       12248.75
Make                 17      1092       939      5348 x   2.50 =       13370.00
Teamcenter def       10       144        88      3163 x   1.00 =        3163.00
HTML                 22       516         2      2769 x   1.90 =        5261.10
yacc                  2       125        72      1047 x   1.51 =        1580.97
XML                   2       103        32       894 x   1.90 =        1698.60
Objective C           6       102        19       704 x   2.96 =        2083.84
C++                   4       104       215       451 x   1.51 =         681.01
DOS Batch            14        93        73       387 x   0.63 =         243.81
Expect                1         0         0        60 x   2.00 =         120.00
Java                  2         6         1        23 x   1.36 =          31.28
sed                   1         0         1         2 x   4.00 =           8.00
-------------------------------------------------------------------------------
SUM:               4101    198261    194507   1078971 x   2.60 =     2806331.35
-------------------------------------------------------------------------------
Andrew Johnson
Thanks, this was really useful.
Pete Hodgson
+2  A: 

On the rare occasions that this has been useful I have used a slight variation on EmmEff's "wc -l" (the following comments only apply to Java but can be adapted for other languages):

cat *.java | grep '[;{]' | wc -l

that is, to find the number of occurrences of semicolons and right braces, so that you have a measure of the number of actual statements in the system rather than the number of physical lines (including blanks). This fails if your developers put all their code on the same line, but if that's the case you are already in trouble and should consider reformatting all the code.

Edit: cloc is of course a much better tool than the above command. This is just the quick-and-dirty version.

It's a bit harder to measure other forms of "code": we also create lots of JSPs, object models (using a web-based tool so there is no text file involved) and of course populating databases with metadata can be an equivalent. Harder to use grep to measure all those, though you can come up with a proxy such as number of attributes in your data model, or number of records in the metadata tables.

The most sensible way to think of "lines of code" is to see them as the atomic unit of a programmer's thought. This is why high-level languages are much more productive both in time, and in lines of code - a single thought represents a much higher level of abstraction and therefore has more power. Simplifying drastically, a given programmer has the capacity for a given number of thoughts-per-hour, and so lines of code as a proxy for that is a (grudgingly) semi-legitimate measure.

I appreciate why a code review company would use it as a measure of their likely workload, though it is a bit unimaginative. Maybe you can persuade them to think instead in terms of the business value they are going to add through their services.

Leigh Caldwell
This seems like a common mistake people make -- there is no reason to 'cat' *.java and then pipe the results through grep. Simply run: grep '[;{]' *.java | wc -l
Trent
yes, good point. Not sure why I didn't put it like that. I think it is probably because I usually use 'find' on a directory structure and pipe the results into grep, and in that case it's harder to replicate the results of find with a simple wildcard.
Leigh Caldwell
A: 

Always using the same way of measuring is more important than which method you use.

Something simple like

find . -regex '.+\.cc$' | xargs cat | wc

Just counts all lines in cc files (on linux). Adding in .h files too would be better. More (like ignoring comments) is overkill and often leads to perverse incentives.

Mike Elkins
A: 

I never used cloc, but I will give it a try.

I normally use sloccount - it also shows the COCOMA stuff - which always impresses me. Ohloh would be nicer with Mercurial support.. ;)

unexist
+5  A: 

I'm dismayed by the use of cat.

find . -regex '.+\.cc$' | xargs cat | wc

is the same as

find . -regex '.+\.cc$' | xargs wc

and

cat *.java | grep '[;{]' | wc -l

is the same as

grep '[;{]' *.java | wc -l

or even

grep -c '[;{]' *.java

One rarely needs to use cat.

Andy Lester
Despite you being dismayed, your examples were better than any other method. Thanks!
bentford
I like the evil smiley: [;{]
java.is.for.desktop
+2  A: 

The only good way to count the number of Lines of Code is to get a logical number, meaning ignoring any code style impact. This way you can compare and measure progression effectively.

In the .NET world you can use the tool NDepend to count easily the number of lines of code and get some special visualization features. By the way, NDepend comes with 82 other code metrics all listed here http://www.ndepend.com/Metrics.aspx.

I wrote 2 blog post detailling the pro and cons of counting Lines of Code:
How do you count your number of Lines Of Code (LOC) ?
Why is it useful to count the number of Lines Of Code (LOC) ?

Patrick Smacchia - NDepend dev
A: 

I've used the Metrics plug-in for Eclipse: http://metrics.sourceforge.net/

Saulo Silva
A: 

I've used a single line of PowerShell from my blog

ls * -recurse -include *.aspx, *.ascx, *.cs, *.ps1 | Get-Content | Measure-Object -Line
Matthew Manela
A: 

If you want a measure that is independent of formatting, whitespace or comments, you can measure Cyclomatic Complexity and/or Halstead code volume.

How you measure these depend on the language syntax, so you generally will need a seperate metrics tool to compute these for each langauge. If you have a lot of langauges, you may not be able to measure everything at which point you have to fall back to something simpler.

Our SD Source Code Search Engine is nominally used to search large source code bases efficiently by indexing each code base in a language specific way. As a side effect of the indexing process, the Search Engine just so happens to compute Halstead and Cyclomatic measures (as well as a number of other simple measures) for each language it can handle, and it presently handles a lot of languages.

Whether the reviewing organization will accept these values as a measure of the code size is another matter. If you don't know them well, I'd use wc and call it a day.

Ira Baxter