We are having a problem on Linux with directory inodes getting large and slow to navigate over time, as many files are created and removed. For example:
% ls -ld foo
drwxr-xr-x 2 webuser webuser 1562624 Oct 26 18:25 foo
% time find foo -type f | wc -l
518
real 0m1.777s
user 0m0.000s
sys 0m0.010s
% cp -R foo foo.tmp
% ls -ld foo.tmp
drwxr-xr-x 2 webuser webuser 45056 Oct 26 18:25 foo.tmp
% time find foo.tmp -type f | wc -l
518
real 0m0.198s
user 0m0.000s
sys 0m0.010s
The original directory has 518 files, takes 1.5 MB to represent, and takes 1.7 seconds to traverse.
The rebuilt directory has the same number of files, takes 45K to represent and .2 seconds to traverse.
I'm wondering what would cause this. My guess is fragmentation - this is not supposed to be a problem with Unix file systems in general, but in this case we are using the directory for short-term cache files and are thus constantly creating, renaming and removing a large number of small files.
I'm also wondering if there's a way to dump the literal binary contents of the directory - that is, read the directory as if it were a file - which would perhaps give me insight into why it is so big. Neither read() nor sysread() from Perl will allow me to:
swartz> perl -Mautodie -MPOSIX -e 'sysopen(my $fh, "foo", O_RDONLY); my $len = sysread($fh, $buf, 1024);'
Can't sysread($fh, '', '1024'): Is a directory at -e line 1
System info:
Linux 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
Thanks!
Jon