Classic file system problem - concurrent remote processing on a directory

I have an application that processes files in a directory and moves them to another directory along with the processed output. Nothing special about that. An interesting requirement was introduced:

Implement fault tolerance and processing throughput by allowing multiple remote instances to work on the same file store.

Additional considerations are that we can not assume the file system, as we support both Windows and NFS.

Of course the problems is, how do I make sure that the different instances do not try and process the same work, potentially corrupting work or reducing throughput? File locking can be problematic, especially across network shares. We can use a more sophisticated method, such as a simple database or messaging framework, (a la JMS or similar), but the entire cluster needs to be fault tolerant. We can't have one database or messaging provider because of the single point of failure that it introduces.

We've implemented a solution that uses multicast messages to self-discover processing instances and elect a supervisor who assigns work. There's a timeout in case the supervisor goes down and another election takes place. Our networking library, however, isn't very mature and the our implementation of messages is clunky.

My instincts, however, tell me that there is a simpler way.

Thoughts?

I think you can safely assume that rename operations are atomic on all network file systems that you care about. So if you arrange an amount of work to be a single file (or keyed to a single file), then have each server first list the directory containing new work, pick a piece of work, and then have it rename the file to its own server name (say, machine name or IP address). For one of the instances who concurrently perform the same operation, the rename will succeed, so they should then process the work. For the others, it will fail, so they should pick a different file from the listing they got.

For creation of new work, assume that directory creation (mkdir) is atomic, but file creation is not (for file creation, the second writer might overwrite the existing file). So if there are multiple producers of work also, create a new directory for each piece of work.

16bytes 2009-07-13 18:49:14

Take a look at http://www.opengroup.org/onlinepubs/9629799/apdxd.htm, which defines (for XNFS compliant implementations, which should be all of them) how caching is supposed to work. You have attribute caching, data caching, directory caching. Doing caching allows you to omit or delay certain operations in response to a system call. For rename(), it is allowed to omit a getattr RPC, but it is not allowed to omit or defer the rename RPC, which in turn is required to be atomic on the server. NFSv3 also specifies that NFSPROC_RENAME is atomic on the client; not sure what that means exactly.

Martin v. Löwis 2009-07-13 20:49:57

As to your specific example: this cannot happen because rename is required to be atomic on the server, since RFC 1094. So one client will definitely get an error message from the server as the source file is gone.

Martin v. Löwis 2009-07-13 20:55:21

ansaurus

tags:

views:

answers:

Classic file system problem - concurrent remote processing on a directory

related questions