views:

239

answers:

5

My app scans part of a file system, and my users reported it was very slow when they were scanning a network drive. Testing my code, I identified the bottleneck: the methods File.isFile(), File.isDirectory(), and File.isHidden(), which are all calling fs.getBooleanAttributes(File f). This method appears to be very slow on Windows network drives. How can I improve performance? Can I avoid calling this method in some way?

+9  A: 

Defensive code oftencalls those isXYZ() methods, and it's generally good practise. However, sometimes the performance is poor, as you've discovered.

An alternative approach is to assume that the file is a file, it exists, it's visible, readable, etc, and just try and read it. If it isn't those things, you'll get an exception, which you can catch, and then do the checks to find out exactly what went wrong. That way, you're optimising for the common case (i.e. everything's fine), and only perform the slow operations when things go wrong.

skaffman
Huh. That's a very interesting way to look at the problem, but unfortunately it's not very applicable in my case. What I'm doing is building a tree mirroring the user's file structure, so I kind of have to figure out whether a File is a file or a directory. Although, I suppose I could use listFiles() to make that distinction... Thanks, you've given me something to think about!
Amanda S
+1 I like the point about optimizing the 'common case'. You can still keep the defensive checks in place to handle when something isn't a file.
mpeterson
+1 I just modified some recursive directory-tree processing code to use listFiles to determine if the "file" is a directory; it's a minor speedup for local and LAN connected drives - I can't test VPN remote drives until I get home. It's also way more compact code.
Software Monkey
If you must call those methods do it in the order that will allow you to short circuit in most cases. If isFile() is true there's no reason to call isDirectory() and there's no reason to call either if isHidden() is true.
Chris Nava
+6  A: 

How are you building this file list? Unless you are displaying every file on the system at the same time, you should have some options...

  1. Only process this information when the user asks for it. e.g. They click on folder "Windows", at which time you could process the files within Windows.
  2. Process this information in a background thread, giving the illusion of better response time.

Perhaps if you show the code you are using to build the list, we could find some other areas of improvement. (Why can't you just infer the type based on the method used to gather the information? If you're calling a method like GetFiles() don't you already know that everything returned is a file?)

mpeterson
+1 for a simple solution.
wheaties
+2 :) ..........
OscarRyz
You're right that I could be lazy and only load folders when the user asks for them, but then there would be a pause every time the user tried to open a new folder, which might make the whole tree feel sluggish. It's a tradeoff between taking a long time at the beginning and being slow all the time...but maybe it wouldn't be so bad. I'll have to test it. Thanks for the ideas!
Amanda S
This is only over the network, correct? Locally things are fast enough?Not sure if it matters, but what protocol are you using to connect?
mpeterson
+1 for processing in the background to improve the perceived performance
rob
Yes, it's pretty fast locally. Not sure what protocol we're using to connect to the network drives...
Amanda S
+3  A: 

I faced exactly the same problem

The solution for our case was quite simple: since our directory structure was following a standard (there where no directory which had the '.' character in it's name), I just followed the standard, and applied a very simple heuristic: "in our case, directories doesn't have the '.' character in it's name". This simple heuristic reduced drastically the number of times our application had to call the isDirectory() function of the java.io.File class.

Maybe this is your case. Maybe on your directory structure you could know if a File is a directory just by it's naming conventions.

Kico Lobo
Personally, I'm very reluctant to write code that relies on "how things ought to be". Not to say this is never a good solution, but it's dangerous. If someone doesn't know the standard or doesn't follow it for whatever reason, your code gives incorrect results. It's one thing to give an error message on bad data, quite another to simply fail.
Jay
It would be nice if I had that kind of standard to rely on, but unfortunately I don't. :o)
Amanda S
+2  A: 

Here's a before and after code example for using listFiles and using isDirectory to walk a directory tree (my code uses a generic callback to actually do something with each directory and file; if I was coding C# this would be a delegate).

As you can see the listFiles approach is actually more compact and readily understood as well as being marginally faster on a local drive (950 ms vs 1000 ms), and LAN drive (26 seconds, vs 28 seconds), both for 23 thousand files.

It's very possible that for a remote connected drive the speedup could be substantial, but I can't test that from work. A little surprisingly the speedup is still only about 10% across a Windows RAS VPN to a network drive.

New Code

static public int processDirectory(File dir, Callback cbk, FileSelector sel) {
    dir=dir.getAbsoluteFile();
    return _processDirectory(dir.getParentFile(),dir,new Callback.WithParams(cbk,2),sel);
    }

static private int _processDirectory(File par, File fil, Callback.WithParams cbk, FileSelector sel) {
    File[]                              ents=(sel==null ? fil.listFiles() : fil.listFiles(sel));    // listFiles returns null if fil is not a directory
    int                                 cnt=1;

    if(ents!=null) {
        cbk.invoke(fil,null);
        for(int xa=0; xa<ents.length; xa++) { cnt+=_processDirectory(fil,ents[xa],cbk,sel); }
        }
    else {
        cbk.invoke(par,fil);                                                    // par can never be null
        }
    return cnt;
    }

Old Code

static public int oldProcessDirectory(File dir, Callback cbk, FileSelector sel) {
    dir=dir.getAbsoluteFile();
    return _processDirectory(dir,new Callback.WithParams(cbk,2),sel);
    }

static private int _processDirectory(File dir, Callback.WithParams cbk, FileSelector sel) {
    File[]                              ents=(sel==null ? dir.listFiles() : dir.listFiles(sel));
    int                                 cnt=1;

    cbk.invoke(dir,null);

    if(ents!=null) {
        for(int xa=0; xa<ents.length; xa++) {
            File                        ent=ents[xa];

            if(!ent.isDirectory()) {
                cbk.invoke(dir,ent);
                ents[xa]=null;
                cnt++;
                }
            }
        for(int xa=0; xa<ents.length; xa++) {
            File                        ent=ents[xa];

            if(ent!=null) {
                cnt+=_processDirectory(ent,cbk,sel);
                }
            }
        }
    return cnt;
    }
Software Monkey
A: 

Just in case you haven't tried it yet, calling getBooleanAttributes yourself and performing the necessary masking will be considerably faster if you are performing multiple checks on the same file. While not a perfect solution (and one that starts to push your code to be platform specific), it could improve performance by a factor of 3 or 4. That's a pretty significant performance boost, even though it isn't nearly as fast as it should be.

The JDK7 java.nio.file.Path functionality should help this sort of thing quite a bit.

Finally, if you have any control at all over the end user environment, suggest that your users configure their antivirus software to not scan network drives. Many of the big AV solutions (not sure exactly what they are solving) have this turned on by default. I don't know what impact this may have on the various File methods, but we've found that improperly configured anit-virus can cause massive latency issues in almost every sort of file access on network resources.

Kevin Day
It would indeed be faster if I could call getBooleanAttributes() myself, but unfortunately, I am constrained to Java 1.5 and java.io.FileSystem is package-protected! Silly abstraction barrier, getting in the way all the time. :o)
Amanda S
You should be able to get at the method using reflection (See Method.setAccessible() )Another trick that might work (I'm not sure if the jvm will reject classes in the java.io pacakge if they aren't signed by Sun) would be to create your own class that's in the same folder (just put it in a java.io folder in your jar).
Kevin Day