views:

175

answers:

1

Suppose I want to get several of a file's properties (owner, size, permissions, times) as returned by the lstat() system call. One way to do this in Java is to create a java.io.File object and do calls like length(), lastModified(), etc. on it. I have two problems so far:

  1. Each one of these calls triggers a stat() call, and for my purposes stat()s are considered expensive: I'm trying to scan billions of files in parallel on hundreds of hosts, and (to a first approximation) the only way to access these files is via NFS, often against filer clusters where stat() under load may take half a second.

  2. The call isn't lstat(), it's typically stat() (which follows symlinks) or fstat64() (which opens the file and may trigger a write operation to record the access time).

Is there a "right" way to do this, such that I end up just doing a single lstat() call and accessing the members of the struct stat? What I have found so far from Googling:

  • JDK 7 will have the PosixFileAttributes interface in java.nio.file with everything I want (but I'd rather not be running nightly builds of my JDK if I can avoid it).

  • I can roll my own interface with JNI or JNA (but I'd rather not if there's an existing one).

A previous similar question got a couple of suggested JNI/JNA implementations. One is gone and the other is questionably maintained (e.g., no downloads, just an hg repository).

Are there any better options out there?

+1  A: 

Looks like you've pretty much covered all the bases. When I started reading your question my first thought was JDK 7 or JNI. Without knowing anything about the change pattern on these files you might also look into some sort of persistent cache of the information in question, like an embedded DB. You could also look at some other access method besides NFS, like a custom web service that provides bulk file information from a remote host.

Jherico
Thanks! Ultimately I guess JDK 7 isn't so bad; I can just keep the binaries with the tool I'm writing, and it will be production-grade software soon enough.
Aaron D. Ball