views:

1053

answers:

3

I'm trying to figure out the best way to find the number of files in a particular directory when there are a very large number of files ( > 100,000).

When there are that many files, performing "ls | wc -l" takes quite a long time to execute. I believe this is because it's returning the names of all the files. I'm trying to take up as little of the disk IO as possible.

I have experimented with some shell and Perl scripts to no avail. Any ideas?

+1  A: 

Did you try find? For example:

find . -name "*.ext" | wc -l
igustin
This will *recursively* find files under the current directory.
mark4o
On my system, `find /usr/share | wc -l` (~137,000 files) is about 25% faster than `ls -R /usr/share | wc -l` (~160,000 lines including dir names, dir totals and blank lines) on the first run of each and at least twice as fast when comparing subsequent (cached) runs.
Dennis Williamson
If he want only current directory, not the whole tree recursively, he can add -maxdepth 1 option to find.
igustin
+2  A: 

You could try if using opendir() and readdir() in Perl is faster. For an example of those function look here

Peter van der Heijden
usage: perl -e 'opendir D, "."; @files = readdir D; closedir D; print scalar(@files)'
glenn jackman
+12  A: 

By default ls sorts the names, which can take a while if there are a lot of them. Also there will be no output until all of the names are read and sorted. Use the ls -f option to turn off sorting.

ls -f | wc -l

Note that this will also enable -a, so ., .., and other files starting with . will be counted.

mark4o
+1 And I thought I knew everything there was to know about `ls`.
mobrule
ZOMG. Sorting of 100K lines is nothing - compared to the `stat()` call `ls` does on every file. `find` doesn't `stat()` thus it works faster.
Dummy00001
`ls -f` does not `stat()` either. But of course both `ls` and `find` call `stat()` when certain options are used, such as `ls -l` or `find -mtime`.
mark4o