views:

86

answers:

3

I have several projects that require me to monitor files, and then edit them as they are getting written to disk. I have a feeling that what I am looking for is operationally the same as how anti-virus tools operate. Let me give more details: 1) I need to trap all files saved by Office application, and then add specific company tags to the headers/footers of each document as they are getting written to disk. 2) I need to know immediately when an editable file (of pretty much any type) is written to disk, so that I can undertake some scanning operations to check if files content meets certain company policies.

In short, you can see that I need to process any user files as they are being written to disk.

Here is my problem. I want to use C# for this task, but I am not sure if it has the ability to meet my requirements. Everything I have seen on the net is geared towards lower-level C programming, which I specifically want to avoid due to time constraints for this project. Anyone aware of how to easily do this task in C#? Is it even feasible (ie too high-level a language, too slow a language etc.)?

A: 

Have you looked at the FileSystemWatcher?

boz
A: 

C# can easily do this. Look at the FileSystemWatcher class (http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx).

JSBangs
FileSystemWatcher would tell you when a file was being modified, but you would have to set one up on **every** folder you wanted to watch.
Robert Harvey
This would be true of any solution, most likely. In any case, there's nothing (other than performance implications) keeping you from doing a recursive watch on the entire drive.
JSBangs
+1  A: 

Performance won't be the issue. I guess I'd question the entire process- it sounds like a recipe for disaster. You can easily hack something together in C# using a FileSystemWatcher in a matter of minutes, but it will be fraught with issues. AV software is bad enough about locking files and screwing up various software, and it's not even trying to modify the file. How do you know when the other app is "done" writing the file? What do you do when you've got the file locked and something else breaks because it can't get access?

nitzmahone
I read somewhere that this can monitor only 80 files in a directory due to buffer limitations. That won't really scale up for me. How would an AV solution do this? Do they replace lower level DLLs so as to be involved in individual file transactions?
Smurf
Not really- where you get into trouble with FSW is when many files in the directory change all at once- it can lose changes. AV uses storage filter drivers and shim-patching I/O libraries in memory (depends on vendor). You can't (legally) do that in managed code. I've made it work before, but it's a REALLY bad idea, and you have to be Mr. Wizard to keep it running. Seriously, I'd rethink the whole approach (do it with save events as an office addin or something). Doing it after the fact will be a nightmare for real users. Patching Office doc props requires exclusive access to the file anyway.
nitzmahone
You raise good points... Changes to the idea: * Monitor files creation or modification * Analyze files for content -- Office was an example, but we need to monitor pretty much any file type * Either raise an alert, if certain content exists, or possibly just block the change. I must ask for more help ;-): 1) Is there a library for content scanning of any file type? Eg. I want to use a RegEx query to find sensitive text in either a Doc or HTML file 2) Is there a way to 'undo' file operations based on FileSystemWatcher output. eg. Can I undo a file copy if file content policy was breached?
Smurf
For the content digging, you'd use registered IFilters (it's what indexing /search uses). The "alert-y" thing would be reasonable. You could probably rig something up with Volume Shadow Copy's saved versions to "undo" the save, but that also sounds like a good way to trash your user's data...
nitzmahone