I'm writing a program that needs to search a directory and all its sub directories for files that have a certain extension. This is going to be used both on a local, and a network drive, so performance is a bit of an issue.
Here's the recursive method I'm using now:
private void GetFileList(string fileSearchPattern, string rootFolderPath, List<FileInfo> files)
{
DirectoryInfo di = new DirectoryInfo(rootFolderPath);
FileInfo[] fiArr = di.GetFiles(fileSearchPattern, SearchOption.TopDirectoryOnly);
files.AddRange(fiArr);
DirectoryInfo[] diArr = di.GetDirectories();
foreach (DirectoryInfo info in diArr)
{
GetFileList(fileSearchPattern, info.FullName, files);
}
}
I could set the SearchOption to AllDirectories and not use a recursive method, but in the future I'll want to insert some code to notify the user what folder is currently being scanned.
While I'm creating a list of FileInfo objects now all I really care about is the paths to the files. I'll have an existing list of files, which I want to compare to the new list of files to see what files were added or deleted. Is there any faster way to generate this list of file paths? Is there anything that I can do to optimize this file search around querying for the files on a shared network drive?
Update 1
I tried creating a non-recursive method that does the same thing by first finding all the sub directories and then iteratively scanning each directory for files. Here's the method:
public static List<FileInfo> GetFileList(string fileSearchPattern, string rootFolderPath)
{
DirectoryInfo rootDir = new DirectoryInfo(rootFolderPath);
List<DirectoryInfo> dirList = new List<DirectoryInfo>(rootDir.GetDirectories("*", SearchOption.AllDirectories));
dirList.Add(rootDir);
List<FileInfo> fileList = new List<FileInfo>();
foreach (DirectoryInfo dir in dirList)
{
fileList.AddRange(dir.GetFiles(fileSearchPattern, SearchOption.TopDirectoryOnly));
}
return fileList;
}
Update 2
Alright so I've run some tests on a local and a remote folder both of which have a lot of files (~1200). Here are the methods I've run the tests on. The results are below.
- GetFileListA(): Non-recursive solution in the update above. I think it's equivalent to Jay's solution.
- GetFileListB(): Recursive method from the original question
- GetFileListC(): Gets all the directories with static Directory.GetDirectories() method. Then gets all the file paths with the static Directory.GetFiles() method. Populates and returns a List
- GetFileListD(): Marc Gravell's solution using a queue and returns IEnumberable. I populated a List with the resulting IEnumerable
- DirectoryInfo.GetFiles: No additional method created. Instantiated a DirectoryInfo from the root folder path. Called GetFiles using SearchOption.AllDirectories
- Directory.GetFiles: No additional method created. Called the static GetFiles method of the Directory using using SearchOption.AllDirectories
Method Local Folder Remote Folder GetFileListA() 00:00.0781235 05:22.9000502 GetFileListB() 00:00.0624988 03:43.5425829 GetFileListC() 00:00.0624988 05:19.7282361 GetFileListD() 00:00.0468741 03:38.1208120 DirectoryInfo.GetFiles 00:00.0468741 03:45.4644210 Directory.GetFiles 00:00.0312494 03:48.0737459
. . .so looks like Marc's is the fastest.