views:

492

answers:

6

I have a directory that continually fills up with "artefact" files. Many different programs dump their temporary files in this directory and it's unlikely that these programs will become self-cleaning any time soon.

Meanwhile, I would like to write a program that continually deletes files in this directory as they become stale, which I'll define as "older than 30 minutes".

A typical approach would be to have a timed mechanism that lists the files in the directory, filters on the old stuff, and deletes the old stuff. However, this approach is not very performant in my case because this directory could conceivably contain 10s or hundreds of thousands of files that do not yet qualify as stale. Consequently, this approach would continually be looping over the same thousands of files to find the old ones.

What I'd really like to do is implement some kind of directory listener that was notified of any new files added to the directory. This listener would then add those files to a queue to be deleted down the road. However, there doesn't appear to be a way to implement such a solution in the languages I program in (JVM languages like Java and Scala).

So: I'm looking for the most efficient way to keep a directory "as clean as it can be" on Windows, preferably with a JVM language. Also, though I've never programmed with Powershell, I'd consider it if it offered this kind of functionality. Finally, if there are 3rd party tools out there to do such things, I'd like to hear about them.

Thanks.

A: 

I'd go with C++ for a utility like this - lets you interface with the WIN32 API, which does indeed have directory listening facilities (FindFirstChangeNotification or ReadDirectoryChangesW). Use one thread that listens for change notifications and updates your list of files (iirc FFCN requires you to rescan the folder, whereas RDCW gives you the actual changes).

If you keep this list sorted according to modification time, it becomes easy to Sleep() just long enough for a file to go stale, instead of polling at some random fixed interval. You might want to do a WaitForSingleObject with a timeout instead of Sleep, in order to react to outside changes (ie, the file you're waiting for to become stale has been deleted externally, so you'll want to wake up and determine when the next file will become stale).

Sounds like a fun little tool to write :)

snemarch
A: 

You might want to bite the bullet and code it up in C# (or VB). What you're asking for is pretty well handled by the FileSystemWatcher class. It would work basically the way you are describing. Register files as they are added into the directory. Have a periodic timer that scans the list of files for ones that are stale and deletes them if they are still there. I'd probably code it up as a Windows service running under a service id that has enough rights to read/delete files in the directory.

EDIT: A quick google turned up this FileSystemWatcher for Java. Commercial software. Never used it, so can't comment on how well it works.

tvanfosson
+2  A: 

If you don't want to write C++, you can use Python. Install pywin32 and you can then use the win32 API as such:

import win32api, win32con
change_handle = win32api.FindFirstChangeNotification(
    path_to_watch,
    0,
    win32con.FILE_NOTIFY_CHANGE_FILE_NAME
)

Full explanation of what to do with that handle by Tim Golden here: http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html.

Lanny
+5  A: 

Why can't you issue a directory system command sorted by oldest first: c:>dir /OD

Take the results and delete all files older than your threshold or sleep if no files are old enough.

Combine that with a Timer or Executor set to a granularity 1 second - 1 minute which guarantees that the files don't keep piling up faster than you can delete them.

Kelly French
Thanks Kelly. On 100k files, a listFiles() call with an AgeFilter took about 3 minutes. Using dir/OD and then parsing the resultant string for the files I needed (by time) consistently runs about 4 seconds. Big improvement!
marc esher
Thanks to everyone for the excellent answers. For my specific case, being on Windows, the combination of using java ProcessBuilder and the right system commands ended up being much faster than using a traditional java-based AgeFilter approach
marc esher
+2  A: 

In Java, you can also use Apache Commons JCI FAM. It's is an opensource java library that you can use for free.

JDK 7 (released in beta currently) includes support for file notifications as well. Check out Java NIO2 tutorial.

Both options should work both on Windows and Linux.

notnoop
Excellent. The WatchService is exactly what I was looking for.I'm thinking my program will end up being a combination of 1) "Find all old files on startup" using Kelly's system command suggestion and then 2) once it's running, it'll use the WatchService to add files-to-be-deleted to the queue.
marc esher
Great. I recommend against using system command line. use `File.lastModified` and a variation of what was recommended here: http://stackoverflow.com/questions/1060153/search-entire-computer-for-a-file-name-in-java
notnoop
msaeed, I started with listFiles and an AgeFilter, but on large directories, it's dog slow. I'm not a fan of using the system command, but it's much faster in my specific case
marc esher