ansaurus

Question

Is this method of file locking acceptable?

Answer 1

+1 A:

File move/rename is generally an atomic operation on most OSes, so it is probably a workable solution.

You will want to add an exception check on your move and open calls, though, in case some other process moved the file between your existence check and the move (or if the move failed to complete).

Edit

To summarize the proper flow that will work:

Issue move from A to A.[myID]
Try to open A.[myID]
If #1 or #2 fails, we didn't get the lock; wait a little bit and then go back to #1. Otherwise, we got the lock, continue.
Make modifications.
Issue move from A.[myID] to A. (Should never fail.) This releases the lock.

A good option for [myID] is the PID of the process (possibly also include the host, if running on multiple systems).

Amber 2010-07-23 20:18:44

Ah, like a try, except structure around the move command?

xnine 2010-07-23 20:25:10

Is it atomic on network shares, too, though? Usually not.

Chris Charabaruk 2010-07-23 21:00:45

Chris: that's why you check on the `open` as well. If the file has been moved, it has been moved to a certain name - thus one process' open will succeed (on the name that it was actually moved to) and the others' will fail (because the file never actually got moved to "their" names).

Amber 2010-07-23 21:02:56

Answer 2

+1 A:

If you don't track your move calls to see if they succeeded or not, you'll never know if you fall victim to a timing window. Remember that if anything can go wrong it will, at the worst possible time.

Rather than using the contents of the file as a flag, maybe you could use the filename itself? For each task rename the file "task_waiting_to_run" to "task_running" to "task_complete". If the rename from "task_waiting_to_run" to "task_running" fails, that means another box got there first.

Mark Ransom 2010-07-23 20:30:42

Since the tasks are in script form, using the names of the scripts as a flag might be a good solution. Thanks. It would mean making multiple copies of the scripts each week.

xnine 2010-07-23 20:37:12

Answer 3

+1 A:

EDIT: It's also common practice to identify the process that renamed the file. That way, should the process die before restoring it to its original name, it would be possible to trace the file's ownership and determine whether to intervene.

I've inserted (barely tested) os and socket calls to add this functionality. Use at your own risk.

If two processes are competing to rename the file, then having them check for its existence first will not prevent a race condition; it will only delay the time when it occurs.

The docs for shutil.move are (sadly) not explicit about throwing an IOError if the file does not exist, but that seems a reasonable expectation -- and I found it does happen in practice:

import shutil
import os
import socket

oldname = "foobar.txt"
newname = (oldname + "." + socket.gethostbyaddr(socket.gethostname())[0]
           + "." + str(os.getpid()))
i_win = True
try:
    shutil.move(oldname, newname)
except IOError, e:
    print "File does not exist"
    i_win = False
except Exception, e:
    print e
    i_win = False

if i_win:
    print "I got it!"

This means that only one process can think it has succeeded in renaming the file.

Dan Breslau 2010-07-23 20:31:37

This is very helpful as well.

xnine 2010-07-23 20:48:53

Answer 4

+4 A:

You've basically developed a filesystem version of the binary semaphore (or mutex). It's a well-studied structure used for locking, so as long as you get the implementation details right, it should work. The trick is to get the "test and set" operation, or in your case "check existence and move," to be truly atomic. For that I'd use something like this:

lock_acquired = False
while not lock_acquired:
    try:
        move(fh, fhtemp)
    except:
        sleep(3)
    else:
        lock_acquired = True
# do your writing
move(fhtemp, fh)
lock_acquired = False

The program as you had it would work most of the time, but as mentioned you could have issues if another process moved the file between the check for its existence and the call to move. I suppose you could work around that, but I'd personally recommend sticking with a well-tested mutex algorithm. (I've translated/ported the above code sample from Modern Operating Systems by Andrew Tanenbaum, but it's possible that I've introduced errors in the conversion - just fair warning)

By the way, the man page for the open function on Linux offers this solution for file locking:

The solution for performing atomic file locking using a lockfile is to create a unique file on the same file system (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.

To implement that in Python, you could do something like this:

# each instance of the process should have a different filename here
process_lockfile = '/path/to/hostname.pid.lock'
# all processes should have the same filename here
global_lockfile = '/path/to/lockfile'
# create the file if necessary (only once, at the beginning of each process)
with open(process_lockfile, 'w') as f:
    f.write('\n') # or maybe write the hostname and pid

# now, each time you have to lock the file:
lock_acquired = False
while not lock_acquired:
    try:
        link(process_lockfile, global_lockfile)
    except:
        lock_acquired = (stat(process_lockfile).st_nlinks == 2)
    else:
        lock_acquired = True
# do your writing
unlink(global_lockfile)
lock_acquired = False

David Zaslavsky 2010-07-23 20:49:34

Answer 5

A:

Relying on network filesystems for locking is a problem that has plagued systems for years (and still often doesn't work quite how you expect it)

Why not use something designed to be explicitly multiuser and transactional, like a database system? (I like Postgres personally...)

It's probably a bit overkill, but the workings are generally easy to understand for something like this. It also makes it easier to expand to add new functionality later.

Steven Schlansker 2010-07-23 21:07:36

Answer 6

+1 A:

Seems to me you are putting too much effort to accomplish something that can be simple if you change your data structure. Right now you have a single file that contains list of the tasks.

How about making the task queue a directory instead, where each pending task is a file? Then the process is as easy as picking a task from directory "Pending", moving it to directory (say) "Running" and after it is done, move the task file to directory "Completed". Since file move is atomic operation, there will be no race condition (if move fails, means another worker just snatched it first, so pick up next task).

Also, checking the progress is as easy as issuing ls on one of the directories :-)

Nas Banov 2010-07-24 09:57:40

ansaurus

tags:

views:

answers:

Is this method of file locking acceptable?

Edit

related questions