My app scans part of a file system, and my users reported it was very slow when they were scanning a network drive. Testing my code, I identified the bottleneck: the methods File.isFile()
, File.isDirectory()
, and File.isHidden()
, which are all calling fs.getBooleanAttributes(File f)
. This method appears to be very slow on Windows network drives. How can I improve performance? Can I avoid calling this method in some way?
views:
239answers:
5Defensive code oftencalls those isXYZ()
methods, and it's generally good practise. However, sometimes the performance is poor, as you've discovered.
An alternative approach is to assume that the file is a file, it exists, it's visible, readable, etc, and just try and read it. If it isn't those things, you'll get an exception, which you can catch, and then do the checks to find out exactly what went wrong. That way, you're optimising for the common case (i.e. everything's fine), and only perform the slow operations when things go wrong.
How are you building this file list? Unless you are displaying every file on the system at the same time, you should have some options...
- Only process this information when the user asks for it. e.g. They click on folder "Windows", at which time you could process the files within Windows.
- Process this information in a background thread, giving the illusion of better response time.
Perhaps if you show the code you are using to build the list, we could find some other areas of improvement. (Why can't you just infer the type based on the method used to gather the information? If you're calling a method like GetFiles() don't you already know that everything returned is a file?)
I faced exactly the same problem
The solution for our case was quite simple: since our directory structure was following a standard (there where no directory which had the '.' character in it's name), I just followed the standard, and applied a very simple heuristic: "in our case, directories doesn't have the '.' character in it's name". This simple heuristic reduced drastically the number of times our application had to call the isDirectory() function of the java.io.File class.
Maybe this is your case. Maybe on your directory structure you could know if a File is a directory just by it's naming conventions.
Here's a before and after code example for using listFiles and using isDirectory to walk a directory tree (my code uses a generic callback to actually do something with each directory and file; if I was coding C# this would be a delegate).
As you can see the listFiles approach is actually more compact and readily understood as well as being marginally faster on a local drive (950 ms vs 1000 ms), and LAN drive (26 seconds, vs 28 seconds), both for 23 thousand files.
It's very possible that for a remote connected drive the speedup could be substantial, but I can't test that from work. A little surprisingly the speedup is still only about 10% across a Windows RAS VPN to a network drive.
New Code
static public int processDirectory(File dir, Callback cbk, FileSelector sel) {
dir=dir.getAbsoluteFile();
return _processDirectory(dir.getParentFile(),dir,new Callback.WithParams(cbk,2),sel);
}
static private int _processDirectory(File par, File fil, Callback.WithParams cbk, FileSelector sel) {
File[] ents=(sel==null ? fil.listFiles() : fil.listFiles(sel)); // listFiles returns null if fil is not a directory
int cnt=1;
if(ents!=null) {
cbk.invoke(fil,null);
for(int xa=0; xa<ents.length; xa++) { cnt+=_processDirectory(fil,ents[xa],cbk,sel); }
}
else {
cbk.invoke(par,fil); // par can never be null
}
return cnt;
}
Old Code
static public int oldProcessDirectory(File dir, Callback cbk, FileSelector sel) {
dir=dir.getAbsoluteFile();
return _processDirectory(dir,new Callback.WithParams(cbk,2),sel);
}
static private int _processDirectory(File dir, Callback.WithParams cbk, FileSelector sel) {
File[] ents=(sel==null ? dir.listFiles() : dir.listFiles(sel));
int cnt=1;
cbk.invoke(dir,null);
if(ents!=null) {
for(int xa=0; xa<ents.length; xa++) {
File ent=ents[xa];
if(!ent.isDirectory()) {
cbk.invoke(dir,ent);
ents[xa]=null;
cnt++;
}
}
for(int xa=0; xa<ents.length; xa++) {
File ent=ents[xa];
if(ent!=null) {
cnt+=_processDirectory(ent,cbk,sel);
}
}
}
return cnt;
}
Just in case you haven't tried it yet, calling getBooleanAttributes yourself and performing the necessary masking will be considerably faster if you are performing multiple checks on the same file. While not a perfect solution (and one that starts to push your code to be platform specific), it could improve performance by a factor of 3 or 4. That's a pretty significant performance boost, even though it isn't nearly as fast as it should be.
The JDK7 java.nio.file.Path functionality should help this sort of thing quite a bit.
Finally, if you have any control at all over the end user environment, suggest that your users configure their antivirus software to not scan network drives. Many of the big AV solutions (not sure exactly what they are solving) have this turned on by default. I don't know what impact this may have on the various File methods, but we've found that improperly configured anit-virus can cause massive latency issues in almost every sort of file access on network resources.