views:

248

answers:

2

I have a folder with ~10 000 subfolders.

Can any linux API or tool watch for any change in any folder below e.g. /SharedRoot or do I have to setup inotify for each folder? (i.e. I loose if I want to do this for 10k+ folders). I guess yes, since I've already seen examples of this inefficient method, for instance http://twistedmatrix.com/trac/browser/trunk/twisted/internet/inotify.py?rev=28866#L345

My problem:
I need to keep folders time-sorted with most recently active "project" up top.

When a file changes, each folder above that file should update its last-modified timestamp to match the file. Delays are ok. Opening a file (typically MS Excel) and closing again, its file date can jump up and then down again. For this reason I need to wait until after a file is closed, then queue the folder of that file for checking, and only a while later do I go and look for the newest file in its folder, since the filedate of the triggering file could already be back-dated to its original timestamp by Excel or similar programs. Also in case several files from same folder are used/created, it makes sense to buffer timestamping of that folders' parents to at least get a bunch of updates collapsed into one delayed update.

I'm looking for a linux solution. I have some code that can be run on a windows server, most of the queing functionality is here: http://github.com/sesam/FolderdateFollowsFiles/blob/master/FolderdateFollowsFiles/Follower.vb

Available API:s
The relative of inotify on windows, ReadDirectoryChangesW, can watch a folder and its whole subtree; see bWatchSubtree on http://msdn.microsoft.com/en-us/library/aa365465(VS.85).aspx

Samba?
Patching samba source is a possibility, but perhaps there are already hooks available? Other possibilities, like client side (various windows versions) and spying on file activities in order to update folders recursively?

+1  A: 

Yes, you need to use inotify, however you need not consume watches on every node immediately.

The process (similar to how beagle does it) is rather simple:

  1. Establish a watch on the root node.
  2. Do a breadth first (not depth first) search starting at the root node
  3. Establish watches on directories, in the order of the search.
  4. Watch for directory create events, continue adding as they do. Re-sort your list as this happens.

The breadth first search is important, otherwise you might miss some stuff due to a race of when you start and what clients of the root node are doing.

See this question, which also mentions this RFQ. I had the same exact problem that you are facing.

In essence, one thread continues to watch for directory create events, adding new watches on new directories almost at the same time that they are created. Something else sorts the list either on demand, or after the inotify thread releases its lock.

I've attempted lock-free versions of the above, but with .. questionable .. success :)

Tim Post
Update: It's actually ~12 000 folders on a NAS with 200MB memory, so setting up watches on all nodes, whether its done breadth-first or just lazily using watever method the -r (recursive) command line option on inotifywatch, seems like its not really feasible.What I would really need is some hints how to attack the other path, using or putting some hooks into smbd
Simon B.
@Simon B - The default number of maximum watches per user is 8192, so 12 or even 15k watches is not at all unusual. Keep in mind, watches (and their paths) are stored in _kernel_ space. The concern here really isn't the memory available in user space, rather, the performance hit on VFS on the NAS end. If watching only directories, that should not be too bad. BTW, I did read correctly, 200MB of ram on a NAS?
Tim Post
Yes. We're using a "Netgear ReadyNAS Duo" and when shopping around we didn't really expect to have to keep the whole folder structure in memory... apart from that, in regular use the real bottleneck on the Duo is CPU power.
Simon B.
+1  A: 

I saw you are running these trees under a Samba share. Maybe you can use the ClamAV virus scanning VFS module for inspiration to see how they trigger the 'scan on close'.

Samba Howto : Stackable VFS Modules

It should be pretty straightforward to check the time of the closed file and modify the directory path leading to it without any of the performance/memory overhead associated with inotify et al.

Just a thought.

Peter Tillemans
Simon B.
Good luck! I love to read a blog post about it when you pulled it off.
Peter Tillemans