views:

142

answers:

3

I know how to use the DirectoryInfo.GetFiles(), but I do not think this is the fastest way to go. These "FileInfo" objects seem a little bit big...

Why do I need it? Well, I tried to achieve my search tool with WDS, but I give up. The OleDB connection is horrible, strange errors without any explanation. So what I am going to do is:

Rebuild the file index in SQL2008.

Currently there are a few open points to check, mostly regarding maintenance:

  1. How do I get all files into the DB
  2. How would I keep the DB in sync with the file system

I will try how much resources the FileSystemWatcher needs later, for now I am looking for the fastest way to get all files from a drive, the full path as string would be sufficient.

So, assume I give you this:

List<string> allFiles =

How would you fill it really fast :-) And btw

new FileInfo("D:").GetFiles("*",SearchOption.All)

is not the best way, I think. Reason 1, possible overhead. More severe reason 2: throws in case of not accessible path (which will most surely happen after 1.5 Mio files)..

+1  A: 

I fire off a separate thread for each subdirectory, and throttle the threads with wait objects. This way I keep a manageable memory size by sending the file names to a database (or a file if you want) and make it fast by having a couple of threads doing the work.

Otávio Décio
A: 

Take a look at this question, it has several alternatives to recursively get files in a lazy way, thus considerably reducing overhead.

Mauricio Scheffer
A: 

What about this. It uses the ThreadPool and recursion. Sending the output directly to a database took way too long, but I think once you get it into a file, you can figure out an efficient way to get it to a database if you want.

Output...

56337/379104 - (number directories/files)
Elapsed seconds: 13.0

Code...

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Data.SqlClient;
using System.IO;

namespace FileCacher
{
    class Program
    {
        public static void Main()
        {
            try
            {
                CacheFiles();
            }
            finally
            {
                Console.WriteLine();
                Console.WriteLine("Press any key to exit.");
                Console.ReadKey();
            }
        }

        private static List<string> sFiles = new List<string>();
        private static void AddFiles(params string[] files)
        {
            lock (sFiles)
            {
                sFiles.AddRange(files);
            }
        }

        private static List<string> sDirectories = new List<string>();
        private static void AddDirectories(params string[] dirs)
        {
            lock (sDirectories)
            {
                sDirectories.AddRange(dirs);
            }
        }

        private static void CacheFiles()
        {
            AddDirectories(@"C:\");
            CacheDirectory(@"C:\");

            var numFiles = 0;
            var numDirs = 0;
            while (true)
            {
                Thread.Sleep(1000);
                var newNumDirs = sDirectories.Count;
                var newNumFiles = sFiles.Count;
                if (newNumDirs == numDirs && newNumFiles == numFiles)
                {
                    Console.WriteLine();
                    break;
                }
                numDirs = newNumDirs;
                numFiles = newNumFiles;
                Console.CursorLeft = 0;
                Console.Write(string.Format("{0}/{1}", numDirs, numFiles));
            }

            using (var fs = new FileStream(@"C:\garb\Dirs.txt", FileMode.Create, FileAccess.Write))
            {
                var sw = new StreamWriter(fs);
                sDirectories.Sort();
                foreach (var dir in sDirectories)
                    sw.WriteLine(dir);
            }

            using (var fs = new FileStream(@"C:\garb\Files.txt", FileMode.Create, FileAccess.Write))
            {
                var sw = new StreamWriter(fs);
                sFiles.Sort();
                foreach (var file in sFiles)
                    sw.WriteLine(file);
            }
        }

        private static void CacheDirectory(object dir)
        {
            try
            {
                var dirPath = (string)dir;
                var dirs = Directory.GetDirectories(dirPath);

                AddDirectories(dirs);
                AddFiles(Directory.GetFiles(dirPath));

                foreach (var childDir in dirs)
                    ThreadPool.QueueUserWorkItem(new WaitCallback(CacheDirectory), childDir);
            }
            catch (UnauthorizedAccessException)
            {
                //ignore
            }
        }

    }
}
Brian