views:

339

answers:

3

Hey there

I'm trying to create an application which scans a drive. The tricky part though, is that my drive contains a set of folders that have folders within folders and contain documents. I'm trying to scan the drive, take a "snapshot" of all documents & folders and dump into a .txt file.
The first time i run this app, the output will be a text file with all the folders & files.
The second time i run this application, it will take the 2 text files (the one produced from the 2nd time i run the app and the .txt file from the 1st time i have run the app) and compare them...reporting what has been moved/overridden/deleted.

Does anybody have any code for this? I'm a newbie at this C# stuff and any help would be greatly appreciated.

Thanks in advance.

A: 

you can easily utilize the DirectoryInfo/FileInfo classes for this.

Basically instantiate an instance of the DirectoryInfo class, pointing towards the c:\ folder. Then using it's objects walk the folder structure.

http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx has code that could quite easily be translated.

Now, the other part of your question is insanity. You can find the differences between the two files relatively easily, but translating that into what has been moved/deleted/etc will take the some fairly advanced logic structures. After all, if I have two files, both named myfile.dat, and one is found at c:\foo and the other at c:\notfoo, how would the one at c:\notfoo be reported if I deleted the one at c:\foo? Another example, is if I have a file myfile2.dat and copy it from c:\bar to c:\notbar is that considered a move? What happens if I copy it on Tuesday, and then on Thursday I delete c:\bar\myfile2.dat--is that a move or a delete? And would the answer change if I ran the program on every Monday as opposed to daily?

There's a whole host of questions, and their corresponding logic structures which you'd need to think of amd code for in order to build that functionality, and even then, it would not be 100% correct, because it's not paging the file system as changes occur--there will always exist the possibility of a scenario that did not get reported correctly in your logic due to timing, logic structure, process time, when the app runs, or just due to the sheer perversity of computers.

Additionally, the processing time would grow exponentially with the size of your drive. After all, you'd need to check every file against every other file to determine it's state as opposed to its previous state. I'd hate to have to run this against my 600+GB drive at home, let alone the 40TB drives I have on servers at work.

Stephen Wrighton
i'd say that the file comparison would done folder to folder. as in, C:\foo on the .txt dump file that has been generated from the previous time i ran the app would be compared to C:\foo from the last time i ran the app. is it still insanity if i plan on doing it that way?
that's slightly different, and a little less insane, as all you're doing is comparing is X file still in Y directory. That said, Dennis Palmer has a better idea, in the FileSystemWatcher class (which I didn't know about) and utilizing a service/task tray application to log file changes on the fly
Stephen Wrighton
+1  A: 

A better approach than your text file comparisons would be to use the FileSystemWatcher Class.

Listens to the file system change notifications and raises events when a directory, or file in a directory, changes.

You could log the changes and then generate your reports as needed from that log.

Dennis Palmer
Much easier indeed, but 'better' would depend on a few factors, mostly the timespan between 2 scans and the desired accuracy.
Henk Holterman
Yeah, FileSystemWatcher is OK unless you get a ton of files dumped on you at once then you need to know how to handle that. I had problems getting all the events when that happened (thousands of files at once get dumped in my system).
Mike Bethany
P.S. What I did was wait for any file event, turn off the event hook, deal with files, poll for a few seconds after done to see if any more files show up, re-enable FileSystemWatcher.
Mike Bethany
+6  A: 

One thing that we learned in the 80's was that if it's really tempting to use recursion for file system walking, but the moment you do that, someone will make a file system with nesting levels that will cause your stack to overflow. It's far better to use heap-based walking of the file system.

Here is a class I knocked together which does just that. It's not super pretty, but it does the job quite well:

using System;
using System.IO;
using System.Collections.Generic;

namespace DirectoryWalker
{
    public class DirectoryWalker : IEnumerable<string>
    {
        private string _seedPath;
        Func<string, bool> _directoryFilter, _fileFilter;

        public DirectoryWalker(string seedPath) : this(seedPath, null, null)
        {
        }

        public DirectoryWalker(string seedPath, Func<string, bool> directoryFilter, Func<string, bool> fileFilter)
        {
            if (seedPath == null)
                throw new ArgumentNullException(seedPath);
            _seedPath = seedPath;
            _directoryFilter = directoryFilter;
            _fileFilter = fileFilter;
        }

        public IEnumerator<string> GetEnumerator()
        {
            Queue<string> directories = new Queue<string>();
            directories.Enqueue(_seedPath);
            Queue<string> files = new Queue<string>();
            while (files.Count > 0 || directories.Count > 0)
            {
                if (files.Count > 0)
                {
                    yield return files.Dequeue();
                }

                if (directories.Count > 0)
                {
                    string dir = directories.Dequeue();
                    string[] newDirectories = Directory.GetDirectories(dir);
                    string[] newFiles = Directory.GetFiles(dir);
                    foreach (string path in newDirectories)
                    {
                        if (_directoryFilter == null || _directoryFilter(path))
                            directories.Enqueue(path);
                    }
                    foreach (string path in newFiles)
                    {
                        if (_fileFilter == null || _fileFilter(path))
                            files.Enqueue(path);
                    }
                }
            }
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
}

Typical usage is this:

DirectoryWalker walker = new DirectoryWalker(@"C:\pathToSource\src", null, (x => x.EndsWith(".cs")));
foreach (string s in walker)
{
    Console.WriteLine(s);
}

Which recursively lists all files that end in ".cs"

plinth
Great answer MUCH better than mine so I'm deleting mine.
Mike Bethany
This is why I come here; great answers like these. The guy that answered this has a fantasic follow up article about this concept here:http://www.atalasoft.com/cs/blogs/stevehawley/archive/2009/06/02/more-ienumerable-t-fun.aspx
Kludge