views:

137

answers:

5

I am looking for a fast way to find the number of files in a directory on Linux.

Any solution that takes linear time in the number of files in the directory is NOT acceptable (e.g. "ls | wc -l" and similar things) because it would take a prohibitively long amount of time (there are tens or maybe hundreds of millions of files in the directory).

I'm sure the number of files in the directory must be stored as a simple number somewhere in the filesystem structure (inode perhaps?), as part of the data structure used to store the directory entries - how can I get to this number?

Edit: The filesystem is ext3. If there is no portable way of doing this, I am willing to do something specific to ext3.

A: 

There's no portable way to do this. The low-level file primitives, i.e. readdir, work as if it's a linear list. Clearly, that's an abstraction, and some filesystems might store a count. However, accessing it is inherently filesystem-specific.

Matthew Flaschen
I don't need a portable way. I just need a way. The filesystem is ext3 if it matters.
HighCommander4
+5  A: 

Why should the data structure contain the number? A tree doesn't need to know its size in O(1), unless it's a requirement (and providing that, could require more locking and possibly a performance bottleneck)

By tree I don't mean including subdir contents, but files with -maxdepth 1 -- supposing they are not really stored as a list..

edit: ext2 stored them as a linked list.

modern ext3 implements hashed B-Trees

Having said that, /bin/ls does a lot more than counting, and actually scans all the inodes. Write your own C program or script using opendir() and readdir().

from here:

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main()
{
        int count;
        struct DIR *d;
        if( (d = opendir(".")) != NULL)
        {
                for(count = 0;  readdir(d) != NULL; count++);
                closedir(d);
        }
        printf("\n %d", count);
        return 0;
}
Marco Mariani
Actually `ls -a` doesn't read more data from the filesystem than your program, as long as you don't pass other options like `--color` or `-F`. Beware that the count returned by `ls -a` or your program includes the `.` and `..` entries (so an empty directory has two entries). On Linux, `ls -A` skips `.` and `..`.
Gilles
and where does it get the file names? i seem to remember getting them requires reading the inode. but it's been a long time, you may be right.
Marco Mariani
@Marco Mariani: @Gilles is right - the filenames are in the directory, not the file inode (after all, a single file inode can have many names). The filenames are available to the program you've written, in `d->d_name`).
caf
+1  A: 

The inode for the directory does not store the number of files in it, since usually the file count is not needed separately from the list of names in the directory. The directory inode's link count does indirectly give the number of sub-directories (st_nlink is number of sub-dirs plus two).

I think you have no choice except read through the whole list of files in the directory. find might or might not be faster than ls.

This is an example of why large directories are a problem, even when the directory is implemented using a B-tree.

Lars Wirzenius
A: 

If you are willing to jump through hoops you may have each directory in a different filesystem, use quotas, and get the info with the "repquota" command.

embobo
+2  A: 

You can use inotify to track and record file create and unlink events in the monitored directory. It would distribute the total time required to maintain file count and allow you to retrieve the current file count instantaneously.

Amardeep