views:

176

answers:

8

Hi I have to create a windows service which Monitors a specified folder for new files and does someprocessing on it and moves it to other location.

I started with using FileSystem Watcher. But my boss doesn't like FileSystemWatcher and wants me to use polling on Timer or any other mechanism other than File System Watcher.

I am confused right now. It would be great if anyone could point me to some examples for how to monitor Folders without using File System Watcher in .net environment.

Thanks,

+2  A: 

I would question why not to use the FileSystemWatcher. It registers with the OS and is notified immediately when the event finishes in the file system.

If you really have to poll, then just create a System.Timers.Timer, create a method for it to call, and check for the file in this method.

davisoa
A: 

weird boss :P, any way you could use File:

            //Global variable
            List<string> fileList = new List<string>();

            string[] files = System.IO.Directory.GetFiles(@"c:\", "*", System.IO.SearchOption.AllDirectories);
            foreach (string file in files)
            {
                if (!fileList.Contains(file))
                {
                    fileList.Add(file);
                    //new file
                    // do some processing

                }

            }

Note this only checks for new files not changed files, if you need that use FileInfo

Petoj
Thanks Petoj, your answer gave me the headstart I needed.
Abbi
cool +1? or answer?
Petoj
I would like to do +1 but do not have enough reputation or am I just very new to this site as I do not know how to do it.
Abbi
A: 

1) Sounds like your boss is an idiot
2) You will have to use functions like Directory.GetFiles, File.GetLastAccessTime, etc and keep it in memory to check if it changed.

Wildhorn
+1 for number two.
Garis Suero
A: 

It is a little odd that you cannot use FileSystemWatcher or presumably any of the Win32 APIs that do the same thing, but that is irrelevant at this point. The polling method might look like this.

public class WorseFileSystemWatcher : IDisposable
{
  private ManaulResetEvent m_Stop = new ManaulResetEvent(false);

  public event EventHandler Change;

  public WorseFileSystemWatcher(TimeSpan pollingInterval)
  {
    var thread = new Thread(
      () =>
      {
        while (!m_Stop.WaitOne(pollingInterval))
        {
          // Add your code to check for changes here.
          if (/* change detected */)
          {
            if (Change != null)
            {
              Change(this, new EventArgs())
            }
          }
        }
      });
    thread.Start();
  }

  public void Dispose()
  {
    m_Stop.Set();
  }
}
Brian Gideon
Love the name! lol!
Garis Suero
+4  A: 

Actually, the FileWatcher component is not 100% "stable" from my experince over the years. Push enough files into a folder and you will loose some events. This is especially true if you monitor a file share, even if you increase the buffer size.

So, for all practical reasons, use FileWatcher together with a Timer which scans a folder for changes, for the most optimal solution.

Examples of creating Timer code should be in abundance if you google it. If you keep track of the last DateTime when the timer ran, then check the modified date of each file, and compare it to the date. Fairly simple logic.

The timer interval depends of how urgent the changes are for your system. But check every minute should be fine for many scenarios.

Mikael Svenson
+1 This is the right solution: FileWatcher plus polling as a backup to handle what FileWatcher misses. -1 everyone who thinks FileWatcher is reliable.
Hightechrider
Most of the `FileSystemWatcher` reliability issues come from a misunderstanding of how it works. The `Changed` event does not get raised when a write to the disk is queued, but instead gets raised only after the write has been committed. The write behind disk cache can impact the timeliness of the events by delaying them indefinitely. The reported solution is to flush the cache via `FlushFileBuffers`. I suspect there are other issues with network shares that have a negative impact on reliability.
Brian Gideon
@Brian: Very good explanation, and I haven't thought about the write cache. For network shares it's the buffer size of smb traffic which is too small to transfer all events, so some might get lost.
Mikael Svenson
+2  A: 

At program startup, use Directory.GetFiles(path) to get the list of files.

Then create a timer, and in its elapsed event call hasNewFiles:

    static List<string> hasNewFiles(string path, List<string> lastKnownFiles)
    {
        List<string> files = Directory.GetFiles(path).ToList();
        List<string> newFiles = new List<string>();

        foreach (string s in files)
        {
            if (!lastKnownFiles.Contains(s))
                newFiles.Add(s);
        }

        return new List<string>();
    }

In the calling code, you'll have new files if:

    List<string> newFiles = hasNewFiles(path, lastKnownFiles);
    if (newFiles.Count > 0)
    {
        processFiles(newFiles);
        lastKnownFiles = newFiles;
    }

edit: if you want a more linqy solution:

    static IEnumerable<string> hasNewFiles(string path, List<string> lastKnownFiles)
    {
        return from f in Directory.GetFiles(path) 
               where !lastKnownFiles.Contains(f) 
               select f;
    }

    List<string> newFiles = hasNewFiles(path, lastKnownFiles); 
    if (newFiles.Count() > 0) 
    { 
        processFiles(newFiles); 
        lastKnownFiles = newFiles; 
    } 
SnOrfus
Thanks SnOrfus. Your reply was very helpful.
Abbi
@Abbi: You're welcome.
SnOrfus
A: 

Yes, you can create a Timer, and plug a handler into the Elapsed event that will instantiate a DirectoryInfo class for the directory you're watching, and call either GetFiles() or EnumerateFiles(). GetFiles() returns a FileInfo[] array, while EnumerateFiles() returns a "streaming" IEnumerable. EnumerateFiles() will be more efficient if you expect a lot of files to be in that folder when you look; you can start working with the IEnumerable before the method has retrieved all the FileInfos, while GetFiles will make you wait.

As to why this may actually be better than FileWatcher, it depends on the architecture behind the scenes. Take, for example, a basic Extract/Transform/Validate/Load workflow. First, such a workflow may have to create expensive instances of objects (DB connections, instances of a rules engine, etc). This one-time overhead is significantly mitigated if the workflow is structured to handle everything available to it in one go. Second, FileWatcher would require anything called by the event handlers, like this workflow, to be thread-safe, since MANY events can be running at once if files are constantly flowing in. If that is not feasible, a Timer can be very easily configured to restrict the system to one running workflow, by having event handlers examine a thread-safe "process running" flag and simply terminate if another handler thread has set it and not yet finished. The files in the folder at that time will be picked up the next time the Timer fires, unlike FileWatcher, where if you terminate the handler the information about the existence of that file is lost.

KeithS
A: 

Using @Petoj's answer I've included a full windows service that polls every five minutes for new files. Its contrained so only one thread polls, accounts for processing time and supports pause and timely stopping. It also supports easy attaching of a debbugger on system.start

 public partial class Service : ServiceBase{


List<string> fileList = new List<string>();

    System.Timers.Timer timer;


    public Service()
    {

    timer = new System.Timers.Timer();
    //When autoreset is True there are reentrancy problems.
    timer.AutoReset = false;


    timer.Elapsed += new System.Timers.ElapsedEventHandler(DoStuff);
}


 private void DoStuff(object sender, System.Timers.ElapsedEventArgs e)
 {


    LastChecked = DateTime.Now;

string[] files = System.IO.Directory.GetFiles(@"c:\", "*", System.IO.SearchOption.AllDirectories);

 foreach (string file in files)
 {
     if (!fileList.Contains(file))
     {
     fileList.Add(file);

     do_some_processing();

     }

}


    TimeSpan ts = DateTime.Now.Subtract(LastChecked);
    TimeSpan MaxWaitTime = TimeSpan.FromMinutes(5);


    if (MaxWaitTime.Subtract(ts).CompareTo(TimeSpan.Zero) > -1)
        timer.Interval = MaxWaitTime.Subtract(ts).Milliseconds;
    else
        timer.Interval = 1;

    timer.Start();





 }


    protected override void OnPause()
 {

     base.OnPause();
     this.timer.Stop();
 }

 protected override void OnContinue()
 {
     base.OnContinue();
     this.timer.Interval = 1;
     this.timer.Start();
 }

 protected override void OnStop()
 {

     base.OnStop();
     this.timer.Stop();
 }


 protected override void OnStart(string[] args)
 {
    foreach (string arg in args)
    {
        if (arg == "DEBUG_SERVICE")
                DebugMode();

    }

     #if DEBUG
         DebugMode();
     #endif

     timer.Interval = 1;
     timer.Start();

    }

private static void DebugMode()
{

    Debugger.Break();
}



 }
Conrad Frix
Thanks for your reply. I am confused about how the Timer works. In the "do_some_processing" I have zipping and Encryption of large files and also database operations. Both are pretty time intensive. What happens when the operation say takes more than (just imagining) 5mins(the polling interval). Suppose processing took say 6mins, So if I am understanding your code correctly it will allow for 6 min for processing and then start the timer ? Before seeing ur code I was using a timer with the interval of 30 sec and so was confused what happens when the timer elapses in middle of processing
Abbi
Thanks Conrad, Finally I was able to create my service and your reply was pretty helpful.
Abbi
No problem. That bit with the timers can be a real pain. Also you should consider accepting an answer since people my hold back from posting answer to someone with a low acceptance rate.
Conrad Frix