ansaurus

Question

Is there a faster alternative to Perl's stat?

Answer 1

+4 A:

stat is doing IO on each file which can't be avoided if you're wanting to read those data. So that'll be the limit on speed and can't be worked around any other way that I can think of.

If you're repeatedly stat-ing the same file(s) then consider using Memoize.

use Memoize();

sub fileStat {
  my ($filename) = @_;
  return stat($filename);
}

Memoize::memoize('fileStat');

mopoke 2010-01-07 21:03:16

using memoize is not necessary. just do @array = stat($file) and get values from it.

depesz 2010-01-07 21:18:57

Memoize will store all the returns values each time you call fileStat, not just a single call to stat.Yes, you could build your own cache for all of the stat return calls, but why do that when Memoize does it for you?

mopoke 2010-01-07 21:44:45

Repeatedly stating the same files, though, will be doing so from the filesystem cache and thus be not nearly as slow as the disk-bound performance the poster is seeing from traversing a whole filesystem. I strongly suspect that Memoize will do no good here.

Andy Ross 2010-01-07 22:14:05

Since Memoize will all you to build a huge cache (Gigabytes if you have the RAM), it will in fact help out above and beyond the file system cache. However, what good is a cache if your looking for recent changes. Use of Memoize may not be a good idea cause it would depned on the poster's use-case.

harschware 2010-01-07 22:25:42

Answer 2

+13 A:

When you call stat you're querying the filesystem and will be limited by its performance. For large numbers of files this will be slow; it's not really a Perl issue.

Michael Carman 2010-01-07 21:05:05

This is the best answer. "stat()" is a unix system call, and the perl function of the same name is just a (very thin!) wrapper around it. If it's slow, it's slow because of the required disk I/O, and that's not something you can fix.

Andy Ross 2010-01-07 22:13:06

Answer 3

A:

Could you post some of your code? I assume that some other optimizations will lie elsewhere. You also might want to try a profiler such as nytprof.

(Hints: map and use an iterative rather than recursive method - this also might be a good use of threads)

ternaryOperator 2010-01-07 21:06:12

Answer 4

A:

Depending on what you're doing with the information, you might be able to generate a find command? However, this still stats every file in the directories to be searched, so the time spent will be similar.

Ether 2010-01-07 21:09:59

Answer 5

A:

If you are on *NIX, you can just use ls and parse the output, I should think.
As Ether mentioned, find is possibly a good alternative if you just want to make decisions on what you stat.
But size, date, and uid should all be available from ls output.
While date and size are available from the dir command on a Windows platform.

Axeman 2010-01-07 21:33:46

On UNIX / Linux, `ls` and `find` will also use the `stat` syscall via the C library method. If these approaches improve performance it is not because of `stat` *per se*.

Stephen C 2010-01-07 22:24:53

@Stephen C: It might call `stat` more efficiently though. I don't know.

Axeman 2010-01-07 23:51:31

Answer is speculative and shows no effort on answerer's behalf.

daxim 2010-01-08 00:18:43

Answer 6

+7 A:

Before you go off optimizing stat, use Devel::NYTProf to see where the real slow-down is.

Also, investigate the details of how you've mounted the filesystem. Is everything local, or have you mounted something over NFS or something similar? There are many things that can be the problem, as other answers have pointed out. Don't spend too much time focussing on any potential problem until you know it's the problem.

Good luck,

brian d foy 2010-01-07 22:18:33

Answer 7

A:

Consider File::Find module.

harschware 2010-01-07 22:28:46

Answer 8

+3 A:

You've seen that stat is slow enough as it is, so don't call it more than once on the same file.

The perlfunc documentation on -X (the shell-ish file test operators) describes a nice cache for stat:

If any of the file tests (or either the stat or lstat operators) are given the special filehandle consisting of a solitary underline, then the stat structure of the previous file test (or stat operator) is used, saving a system call. (This doesn't work with -t, and you need to remember that lstat and -l will leave values in the stat structure for the symbolic link, not the real file.) (Also, if the stat buffer was filled by an lstat call, -T and -B will reset it with the results of stat _). Example:
print "Can do.\n" if -r $a || -w _ || -x _;
stat($filename);
print "Readable\n" if -r _;
print "Writable\n" if -w _;
print "Executable\n" if -x _;
print "Setuid\n" if -u _;
print "Setgid\n" if -g _;
print "Sticky\n" if -k _;
print "Text\n" if -T _;
print "Binary\n" if -B _;

Greg Bacon 2010-01-07 22:42:00

ansaurus

tags:

views:

answers:

Is there a faster alternative to Perl's stat?

related questions