views:

9850

answers:

8

In Bash, how do I count the number of non-blank lines of code in a project?

+1  A: 
awk '/^[[:space:]]*$/ {++x} END {print x}' "$testfile"
Ben Hoffstein
+11  A: 
cat foo.c | sed '/^\s*$/d' | wc -l

And if you consider comments blank lines:

cat foo.pl | sed '/^\s*#/d;/^\s*$/d' | wc -l

Although, that's language dependent.

Michael Cramer
Not sure why you're using cat there.Use foo.c or foo.pl as the filename to pass to sed.sed '/^\s*$/d' foo.c | wc -l
Andy Lester
Just habit. I read pipelines from left to right, which means I usually start with cat, then action, action, action, etc. Clearly, the end result is the same.
Michael Cramer
To do this for all files in all subfolders and to exclude comments with '//', extend this command into this: find . -type f -name '*.c' -exec cat {} \; | sed '/^\s*#/d;/^\s*$/d;/^\s*\/\//d' | wc -l
Jami
+2  A: 

'wc' counts lines, words, chars, so to count all lines (including blank ones) use:

wc *.py

To filter out the blank lines, you can use grep:

grep -v '^\W*$' *.py | wc

'-v' tells grep to output all lines except those that match '^' is the start of a line '\W*' is zero or more whitespace characters '$' is the end of a line *.py is my example for all the files you wish to count (all python files in current dir) pipe output to wc. Off you go.

I'm answering my own (genuine) question. Couldn't find an stackoverflow entry that covered this.

Tartley
\W isn't a match for whitespace, it matches non-word characters. It's the opposite of \w, word characters. \W Will match anything that isn't alphanumeric or underscore, and therefore won't do what you claim it does here.You mean \s
SpoonMeiser
+6  A: 

If you want to use something other than a shell script, try CLOC:

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. It is written entirely in Perl with no dependencies outside the standard distribution of Perl v5.6 and higher (code from some external modules is embedded within cloc) and so is quite portable.

xsl
+6  A: 

There are many ways to do this, using common shell utilities.

My solution is:

grep -cve '^\s*$' <file>

This searches for lines in <file> the do not match (-v) lines that match the pattern (-e) '^\s*$', which is the beginning of a line, followed by 0 or more whitespace characters, followed by the end of a line (ie. no content other then whitespace), and display a count of matching lines (-c) instead of the matching lines themselves.

An advantage of this method over methods that involve piping into wc, is that you can specify multiple files and get a separate count for each file:

$ grep -cve '^\s*$' *.hh

config.hh:36
exceptions.hh:48
layer.hh:52
main.hh:39
SpoonMeiser
Thanks! Incidentally, wc does provide a count for each given file, plus a total.
Tartley
Not if you're piping into it though, as standard in counts as just one file.
SpoonMeiser
+1  A: 
cat 'filename' | grep '[^ ]' | wc -l

should do the trick just fine

curtisk
Why use cat and pipe the file into grep, when you can pass the filename as an argument to grep in the first place?
SpoonMeiser
true, it's just an old alias I have around... it does essentially the same as your solution instead of using the inverse
curtisk
A: 

It's kinda going to depend on the number of files you have in the project. In theory you could use

grep -c '.' <list of files>

Where you can fill the list of files by using the find utility.

grep -c '.' `find -type f`

Would give you a line count per file.

Linor
. matches whitespace. This solution only works if you consider a line containing only whitespace to be non-blank, which it technically is, although it probably isn't what you're after.
SpoonMeiser
+10  A: 
#!/bin/bash
find . -path './pma' -prune -o -path './blog' -prune -o -path './punbb' -prune -o -path './js/3rdparty' -prune -o -print | egrep '\.php|\.as|\.sql|\.css|\.js' | grep -v '\.svn' | xargs cat | sed '/^\s*$/d' | wc -l

The above will give you the total count of lines of code (blank lines removed) for a project (current folder and all subfolders recursively).

In the above "./blog" "./punbb" "./js/3rdparty" and "./pma" are folders I blacklist as I didn't write the code in them. Also .php, .as, .sql, .css, .js are the extensions of the files being looked at. Any files with a different extension are ignored.

Gilles