tags:

views:

133

answers:

5

Hi,

I have program which writes to database which folders are full or empty. Now I'm using

bool hasFiles=false;
(Directory.GetFiles(path).Length >0) ? hasFiles=true: hasFiles=false;

but it takes almost one hour, and I can't do anything in this time.

Is there any fastest way to check if folder has any file ?

+2  A: 

I'm assuming (although I don't know for definite) that because you're calling GetFiles() on a network drive it adds considerable time to retrieve all the files from all 30k folders and enumerate through them.

I've found an alternative Directory Enumerator here on CodeProject which looks promising.

Alternatively... you could create a WebService on the server that enumerates everything for you and returns the results after.

EDIT: I think your problem is more likely the folder access. Each time you access a Directory in the network drive you're going to be hitting security and permission checks. That * 30k folders will be a big performance hit. I highly doubt using the FindFirstFile will help much as the actual number of files enumerated will only ever be 0 or 1.

GenericTypeTea
A: 

Your best bet is to use the API function FindFirstFile. It wont take nearly as long then.

logicnp
Each folder only has one file; the problem *looks* to be the vast number of remote *folders*, accessed sequentially.
Marc Gravell
+1 Here's a discussion where someone finds that FindFirstfile is a lot faster than Directories.GetFiles for checking for empty directories so worth trying: http://stackoverflow.com/questions/755574/how-to-quickly-check-if-folder-is-empty-net
ho1
I'm in agreement with Marc here. The problem isn't enumerating files, it's enumerating and stepping through all the folder structures. Each time .Net calls GetFiles() on a Directory, there's going to be a series of Security checks every time the Directory has access attempted on it.
GenericTypeTea
+1  A: 

Might be worth mentioning:

but it takes almost one hour, and I can't do anything in this time. (emphasis added)

Are you doing this from a GUI app, on the main thread? If so, spit this process off using a BackgroundWorker. At least then the app will continue to be responsive. You could also add checks for CancellationPending in the method and cancel it if it's taking too long.

Kind of tangential to your question--just something I noticed and thought I'd comment on.

Dan Tao
+1  A: 

The key to speeding up such a cross-network search is to cut down the number of requests across the network. Rather than getting all the directories, and then checking each for files, try and get everything from one call.

In .NET 3.5 there is no one method to recursively get all files and folders, so you have to build it yourself (see below). In .NET 4 new overloads exist to to this in one step.

Using DirectoryInfo one also gets information on whether the returned name is a file or directory, which cuts down calls as well.

This means splitting a list of all the directories and files becomes something like this:

struct AllDirectories {
  public List<string> DirectoriesWithoutFiles { get; set; }
  public List<string> DirectoriesWithFiles { get; set; }
}

static class FileSystemScanner {
  public AllDirectories DivideDirectories(string startingPath) {
    var startingDir = new DirectoryInfo(startingPath);

    // allContent IList<FileSystemInfo>
    var allContent = GetAllFileSystemObjects(startingDir);
    var allFiles = allContent.Where(f => !(f.Attributes & FileAttributes.Directory))
                             .Cast<FileInfo>();
    var dirs = allContent.Where(f => (f.Attributes & FileAttributes.Directory))
                         .Cast<DirectoryInfo>();
    var allDirs = new SortedList<DirectoryInfo>(dirs, new FileSystemInfoComparer());

    var res = new AllDirectories {
      DirectoriesWithFiles = new List<string>()
    };
    foreach (var file in allFiles) {
      var dirName = Path.GetDirectoryName(file.Name);
      if (allDirs.Remove(dirName)) {
        // Was removed, so first time this dir name seen.
        res.DirectoriesWithFiles.Add(dirName);
      }
    }
    // allDirs now just contains directories without files
    res.DirectoriesWithoutFiles = new List<String>(addDirs.Select(d => d.Name));
  }

  class FileSystemInfoComparer : IComparer<FileSystemInfo> {
    public int Compare(FileSystemInfo l, FileSystemInfo r) {
      return String.Compare(l.Name, r.Name, StringComparison.OrdinalIgnoreCase);
    }
  }
}

Implementing GetAllFileSystemObjects depends on the .NET version. On .NET 4 it is very easy:

ILIst<FileSystemInfo> GetAllFileSystemObjects(DirectoryInfo root) {
  return root.GetFileSystemInfos("*.*", SearchOptions.AllDirectories);
}

On earlier versions a little more work is needed:

ILIst<FileSystemInfo> GetAllFileSystemObjects(DirectoryInfo root) {
  var res = new List<FileSystemInfo>();
  var pending = new Queue<DirectoryInfo>(new [] { root });

  while (pending.Count > 0) {
    var dir = pending.Dequeue();
    var content = dir.GetFileSystemInfos();
    res.AddRange(content);
    foreach (var dir in content.Where(f => (f.Attributes & FileAttributes.Directory))
                               .Cast<DirectoryInfo>()) {
      pending.Enqueue(dir);
    }
  }

  return res;
}

This approach calls into the filesystem as few times as possible, just once on .NET 4 or once per directory on earlier versions, allowing the network client and server to minimise the number of underlying filesystem calls and network round trips.

Getting FileSystemInfo instances has the disadvantage of needing multiple file system operations (I believe this is somewhat OS dependent), but for each name any solution needs to know if it is a file or directory so this is not avoidable at some level (without resorting to P/Invoke of FindFileFirst/FindNextFile/FindClose).


Aside, the above would be easier with a partition extension method:

Tuple<IEnumerable<T>,IEnumerable<T>> Extensions.Partition<T>(
                                                 this IEnumerable<T> input,
                                                 Func<T,bool> parition);

Writing that to be lazy would be an interesting exercise (only consuming input when something iterates over one of the outputs, while buffering the other).

Richard
A: 

If you are using .Net 4.0 have a look at the EnumerateFiles method. http://msdn.microsoft.com/en-us/library/dd413232(v=VS.100).aspx

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of FileInfo objects before the whole collection is returned; when you use GetFiles, you must wait for the whole array of FileInfo objects to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

This way not all the files are retrieved from the folder, if the enumerator has at least 1 file the folder is not empty

Jasper