The key to speeding up such a cross-network search is to cut down the number of requests across the network. Rather than getting all the directories, and then checking each for files, try and get everything from one call.
In .NET 3.5 there is no one method to recursively get all files and folders, so you have to build it yourself (see below). In .NET 4 new overloads exist to to this in one step.
Using DirectoryInfo
one also gets information on whether the returned name is a file or directory, which cuts down calls as well.
This means splitting a list of all the directories and files becomes something like this:
struct AllDirectories {
public List<string> DirectoriesWithoutFiles { get; set; }
public List<string> DirectoriesWithFiles { get; set; }
}
static class FileSystemScanner {
public AllDirectories DivideDirectories(string startingPath) {
var startingDir = new DirectoryInfo(startingPath);
// allContent IList<FileSystemInfo>
var allContent = GetAllFileSystemObjects(startingDir);
var allFiles = allContent.Where(f => !(f.Attributes & FileAttributes.Directory))
.Cast<FileInfo>();
var dirs = allContent.Where(f => (f.Attributes & FileAttributes.Directory))
.Cast<DirectoryInfo>();
var allDirs = new SortedList<DirectoryInfo>(dirs, new FileSystemInfoComparer());
var res = new AllDirectories {
DirectoriesWithFiles = new List<string>()
};
foreach (var file in allFiles) {
var dirName = Path.GetDirectoryName(file.Name);
if (allDirs.Remove(dirName)) {
// Was removed, so first time this dir name seen.
res.DirectoriesWithFiles.Add(dirName);
}
}
// allDirs now just contains directories without files
res.DirectoriesWithoutFiles = new List<String>(addDirs.Select(d => d.Name));
}
class FileSystemInfoComparer : IComparer<FileSystemInfo> {
public int Compare(FileSystemInfo l, FileSystemInfo r) {
return String.Compare(l.Name, r.Name, StringComparison.OrdinalIgnoreCase);
}
}
}
Implementing GetAllFileSystemObjects
depends on the .NET version. On .NET 4 it is very easy:
ILIst<FileSystemInfo> GetAllFileSystemObjects(DirectoryInfo root) {
return root.GetFileSystemInfos("*.*", SearchOptions.AllDirectories);
}
On earlier versions a little more work is needed:
ILIst<FileSystemInfo> GetAllFileSystemObjects(DirectoryInfo root) {
var res = new List<FileSystemInfo>();
var pending = new Queue<DirectoryInfo>(new [] { root });
while (pending.Count > 0) {
var dir = pending.Dequeue();
var content = dir.GetFileSystemInfos();
res.AddRange(content);
foreach (var dir in content.Where(f => (f.Attributes & FileAttributes.Directory))
.Cast<DirectoryInfo>()) {
pending.Enqueue(dir);
}
}
return res;
}
This approach calls into the filesystem as few times as possible, just once on .NET 4 or once per directory on earlier versions, allowing the network client and server to minimise the number of underlying filesystem calls and network round trips.
Getting FileSystemInfo
instances has the disadvantage of needing multiple file system operations (I believe this is somewhat OS dependent), but for each name any solution needs to know if it is a file or directory so this is not avoidable at some level (without resorting to P/Invoke of FindFileFirst
/FindNextFile
/FindClose
).
Aside, the above would be easier with a partition extension method:
Tuple<IEnumerable<T>,IEnumerable<T>> Extensions.Partition<T>(
this IEnumerable<T> input,
Func<T,bool> parition);
Writing that to be lazy would be an interesting exercise (only consuming input when something iterates over one of the outputs, while buffering the other).