tags:

views:

7194

answers:

6

Suppose I want to count the lines of code in a project. If all of the files are in the same directory I can execute:

cat * | wc -l

However, if there are sub-directories, this doesn't work. For this to work cat would have to have a recursive mode. I suspect this might be a job for xargs, but I wonder if there is a more elegant solution?

+10  A: 

Try using the find command, which recurses directories by default:

find . -type f -execdir cat {} \; | wc -l

chromakode
Much faster just to pipe it through xargs
Ken
I believe it :) I try to do as little shell scripting as possible, so the more clever `xargs` approach escaped me. Thanks for teaching me something!
chromakode
+7  A: 

I think you're probably stuck with xargs

find -name '*php' | xargs cat | wc -l

chromakode's method gives the same result but is much much slower. If you use xargs your cating and wcing can start as soon as find starts finding.

Good explanation at Linux: xargs vs. exec {}

Ken
but unfortunately, you won't get multi-threading goodness there because the pipe makes them all share the same processing line.
Kent Fredric
oh, and fyi, that article is bunk. -exec cmd {} + bundles filenames. xargs has the "-1" parameter as well to emulate finds other behaviour.
Kent Fredric
Thanks Kent. Can you point me at any documentation on "-1"? I'm using GNU xargs version 4.2.32 and can see nothing in the man page.
Ken
sorry, "-l" which is the limiter. -l[max-lines] ( minor brainfart )
Kent Fredric
I did some rigourous tests, Xargs is still faster on all fronts, i posted this to that page, http://rafb.net/p/HLOs3385.html
Kent Fredric
+25  A: 

First you do not need to use cat to count lines. This is an antipattern called Useless Use of Cat (UUoC).

wc -l * 

Then the find command recurses the sub-directories:

find . -name "*.c" -exec wc -l {} \;
  • . is the name of the top directory to start searching from

  • -name "*.c" is the pattern of the file you're interested in

  • -exec gives a command to be executed

  • {} is the result of the find command to be passed to the command (here wc-l)

  • \; indicates the end of the command

This command produces a list of all files found with their line count, if you want to have the sum for all the files found, you can use find to list the files (with the -print option) and than use xargs to pass this list as argument to wc-l.

find . -name "*.c" -print | xargs wc -l 

EDIT to address Robert Gamble comment (thanks): if you have spaces or newlines (!) in file names, then you have to use -print0 option instead of -print and xargs -null so that the list of file names are exchanged with null-terminated strings.

find . -name "*.c" -print0 | xargs -0 wc -l

Unix philosophy is to have tools that does one thing only.

philippe
Seconded. Wanted to point out the UUoC (Useless Use of Cat), but didn't.
ayaz
I think that the particular challenge is to get the *total* line count for an entire tree of files. Is there a way to do that simply using the find command?
chromakode
Your xargs example is almost identical to what I originally came up with by it doesn't handle filenames with spaces in them.
Robert Gamble
+1, Very nice, and even shorter in zsh: "wc -l **/*.c"
orip
the comment got messed up: it's "wc -l star star / star .c"
orip
The "find ... -print0 | xargs -0 ..." trick is worth committing to memory.
detly
+8  A: 

If you want a code-golfing answer:

grep '' -R . | wc -l

The problem with just using wc -l on its own is it cant descend well, and the oneliners using

find . -exec wc -l {} \;

Won't give you a total line count because it runs wc once for every file, ( loL! ) and

find . -exec wc -l {} +

Will get confused as soon as find hits the 32000 file argument limit for parameters and instead calls wc multiple times, each time only giving you a partial summary.

Additionally, the above grep trick will not add more than 1 line to the output when it encounters a binary file, which could be circumstantially beneficial.

For the cost of 1 extra command character, you can ignore binary files completely:

 grep '' -IR . | wc -l

If you want to run line counts on binary files too

 grep '' -aR . | wc -l
Kent Fredric
+4  A: 

The correct way is:

find . -name "*.c" -print0 | xargs -0 cat | wc -l

You must use -print0 because there are only two invalid characters in Unix filenames: The null byte and "/" (slash). So for example "xxx\npasswd" is a valid name. In reality, you're more likely to encounter names with spaces in them, though. The commands above would count each word as a separate file.

You might also want to use "-type f" instead of -name to limit the search to files.

Aaron Digulla
no that's not right. xargs could execute the wc serveral times, resulting in more than one result for different set of files. you should use cat, and at the end pipe into one wc like Ken showed
Johannes Schaub - litb
You're right. I made xargs call cat (instead of wc) and then pipe the result through wc.
Aaron Digulla
+3  A: 

Using cat or grep in the solutions above is wasteful if you can use relatively recent GNU tools, including Bash:

wc -l --files0-from=<(find . -name \*.c -print0)

This handles file names with spaces, arbitrary recursion and any number of matching files, even if they exceed the command line length limit.

Idelic