views:

76

answers:

5

Hi, What is the best way for me to check for new files added to a directory, I dont think the filesystemwatcher would be suitable as this is not an always on service but a method that runs when my program starts up.

there are over 20,000 files in the folder structure I am monitoring, at present I am checking each file individually to see if the filepath is in my database table, however this is taking around ten minutes and I would like to speed it up is possible,

I can store the date the folder was last checked - is it easy to get all files with createddate > last checked date.

anyone got any Ideas?

Thanks

Mark

A: 

Can you write a service that runs on that machine? The service can then use FileSystemWtcher

Kevin Jones
thanks, I thought about that but dont like the Idea of having a service, just personal preference
foz1284
That would still not guarantee that you won't miss any changes. Also - generally speaking - having a service running just for an application that runs every now and then is bad design. Of course, in this case it might be acceptable; it depends on the type of application.
Thorarin
+3  A: 

Your approach is the only feasible (i.e. file system watcher allows you to see changes, not check on start).

Find out what takes so long. 20.000 checks should not take 10 minutes - maybe 1 maximum. Your program is written slowly. How do you test it?

Hint: do not ask the database, get a list of all files into memory, a list of all filesi n the database, check in memory. 20.000 SQL statements to the database are too slow, this way you need ONE to get the list.

TomTom
Cheers for the hint tomtom, you're right about my program calling the database for each file, certainly one area I can refactor.
foz1284
+1  A: 

FileSystemWatcher is not reliable, so even if you could use a service, it would not necessarily work for you.

The two options I can see are:

  1. Keep a list of files you know about and keep comparing to this list. This will allow you to see if files were added, deleted etc. Keep this list in memory, instead of querying the database for each file.
  2. As you suggest, store a timestamp and compare to that.
Oded
The timestamp method should work really well, but *only* if you don't have to check for deleted files as well. That's not part of the question, but it makes sense that you would be interested in knowing that as well. Possibly, you could do the check for deleted files less frequently, or in the background or something.
Thorarin
+2  A: 

10 minutes seems awfully long for 20,000 files. How are you going about doing the comparison? Your suggestion doesn't account for deleted files either. If you want to remove those from the database, you will have to do a full comparison.

Perhaps the problem is the database round trips. You can retrieve a known file list from the database in large chunks (or all at once), sorted alphabetically. Sort the local file list as well and walk the two lists, processing missing or new entries as you go along.

Mick
you're right about the problem being round trips, I dont need to worry about deleted files as I am checking for this each time I try to load a file cheers
foz1284
@foz1284: in that case, using timestamps is an option. They're not technically 100% reliable, because someone could change a timestamp on a file. That may not be a problem in your case however.
Mick
A: 

Having a FileSystemWatcher service like Kevin Jones suggests is probably the most pragmatic answer, but there are some other options.

You can watch the directory with inotify if you mount it with Samba on a linux box. That of course assumes you don't mind fragmenting your platform, but that's what inotify is there for.

And then more correctly but with correspondingly less chance of you getting a go-ahead, if you're sitting monitoring a directory with 20K files in it it is probably time to evolve your system architecture. Not knowing all that much more about your application, it sounds like a message queue might be worth looking at.

JosefAssad
its a picture management type app so the structure is just the my pictures folder, as thorrarin said, having a service run for this program which may only be run sporadically seems overkill
foz1284