views:

506

answers:

5

I'm running a very computationally intensive scientific job that spits out results every now and then. The job is basically to just simulate the same thing a whole bunch of times, so it's divided among several computers, which use different OSes. I'd like to direct the output from all these instances to the same file, since all the computers can see the same filesystem via NFS/Samba. Here are the constraints:

  1. Must allow safe concurrent appends. Must block if some other instance on another computer is currently appending to the file.
  2. Performance does not count. I/O for each instance is only a few bytes per minute.
  3. Simplicity does count. The whole point of this (besides pure curiosity) is so I can stop having every instance write to a different file and manually merging these files together.
  4. Must not depend on the details of the filesystem. Must work with an unknown filesystem on an NFS or Samba mount.

The language I'm using is D, in case that matters. I've looked, there's nothing in the standard lib that seems to do this. Both D-specific and general, language-agnostic answers are fully acceptable and appreciated.

+1  A: 

I don't know D, but I thing using a mutex file to do the jobe might work. Here's some pseudo-code you might find useful:

do {
  // Try to create a new file to use as mutex.
  // If it's already created, it will throw some kind of error.
  mutex = create_file_for_writing('lock_file');
} while (mutex == null);

// Open your log file and write results
log_file = open_file_for_reading('the_log_file');
write(log_file, data);
close_file(log_file);

close_file(mutex);
// Free mutex and allow other processes to create the same file.
delete_file(mutex);

So, all processes will try to create the mutex file but only the one who wins will be able to continue. Once you write your output, close and delete the mutex so other processes can do the same.

Seb
You must have missed the part where he said he needs synchronization between different computers.
CyberShadow
And this solution will not work over NFS as he requested.
Jiri Klouda
Why wouldn´t this work? I don´t mean writing a file locally in each computer but in a single location for all of them.
Seb
+3  A: 

Over NFS you face some problems with client side caching and stale data. I have written an OS independent lock module to work over NFS before. The simple idea of creating a [datafile].lock file does not work well over NFS. The basic idea to work around it is to create a lock file [datafile].lock which if present means file is NOT locked and a process that wants to acquire a lock renames the file to a different name like [datafile].lock.[hostname].[pid]. The rename is an atomic enough operation that works well enough over NFS to guarantee exclusivity of the lock. The rest is basically a bunch of fail safe, loops, error checking and lock retrieval in case the process dies before releasing the lock and renaming the lock file back to [datafile].lock

Jiri Klouda
+2  A: 

The classic solution is to use a lock file, or more accurately a lock directory. On all common OSs creating a directory is an atomic operation so the routine is:

  • try to create a lock directory with a fixed name in a fixed location
  • if the create failed, wait a second or so and try again - repeat until success
  • write your data to the real data file
  • delete the lock directory

This has been used by applications such as CVS for many years across many platforms. The only problem occurs in the rare cases when your app crashes while writing and before removing the lock.

anon
+2  A: 

Lock File with a twist

Like other answers have mentioned, the easiest method is to create a lock file in the same directory as the datafile.

Since you want to be able to access the same file over multiple PC the best solution I can think of is to just include the identifier of the machine currently writing to the data file.

So the sequence for writing to the data file would be:

  1. Check if there is a lock file present

  2. If there is a lock file, see if I'm the one owning it by checking that its content has my identifier.
    If that's the case, just write to the data file then delete the lock file.
    If that's not the case, just wait a second or a small random length of time and try the whole cycle again.

  3. If there is no lock file, create one with my identifier and try the whole cycle again to avoid race condition (re-check that the lock file is really mine).

Along with the identifier, I would record a timestamp in the lock file and check whether it's older than a given timeout value.
If the timestamp is too old, then assume that the lock file is stale and just delete it as it would mea one of the PC writing to the data file may have crashed or its connection may have been lost.

Another solution

If you are in control the format of the data file, could be to reserve a structure at the beginning of the file to record whether it is locked or not.
If you just reserve a byte for this purpose, you could assume, for instance, that 00 would mean the data file isn't locked, and that other values would represent the identifier of the machine currently writing to it.

Issues with NFS

OK, I'm adding a few things because Jiri Klouda correctly pointed out that NFS uses client-side caching that will result in the actual lock file being in an undetermined state.

A few ways to solve this issue:

  • mount the NFS directory with the noac or sync options. This is easy but doesn't completely guarantee data consistency between client and server though so there may still be issues although in your case it may be OK.

  • Open the lock file or data file using the O_DIRECT, the O_SYNC or O_DSYNC attributes. This is supposed to disable caching altogether.
    This will lower performance but will ensure consistency.

  • You may be able to use flock() to lock the data file but its implementation is spotty and you will need to check if your particular OS actually uses the NFS locking service. It may do nothing at all otherwise.
    If the data file is locked, then another client opening it for writing will fail.
    Oh yeah, and it doesn't seem to work on SMB shares, so it's probably best to just forget about it.

  • Don't use NFS and just use Samba instead: there is a good article on the subject and why NFS is probably not the best answer to your usage scenario.
    You will also find in this article various methods for locking files.

  • Jiri's solution is also a good one.

Basically, if you want to keep things simple, don't use NFS for frequently-updated files that are shared amongst multiple machines.

Something different

Use a small database server to save your data into and bypass the NFS/SMB locking issues altogether or keep your current multiple data files system and just write a small utility to concatenate the results.
It may still be the safest and simplest solution to your problem.

Renaud Bompuis
This solution, while working fine on single computer, will run into race conditions because of NFS client side caching.
Jiri Klouda
Note that NFSv4 fixes many of the problems with older versions of the protocol.
janneb
+2  A: 

Why not just build a simple server which sits between the file and the other computers?

Then if you ever wanted to change the data format, you would only have to modify the server, and not all of the clients.

In my opinion building a server would be much easier than trying to use a Network file system.

Brad Gilbert
Or just use a database and store the data in a proper database and locking problems solved.
Jiri Klouda
I don't have a database configured and I don't want to configure one just to solve such a simple problem.
dsimcha