views:

84

answers:

1

I'm on linux, nfs, with multiple machines involved.

I'm trying to use fcntl to implement filelocking. I was using flock until I discovered it only works between processes on the same machine.

Now when I call fcntl with F_SETLKW, perl alarms (for adding a timeout) don't work as before. This would normally be ok, but ctrl-c doesn't really work either.

What I believe is happening, is that fcntl is only checking for signals every 30 seconds or so. The alarm does come back eventually. The ctrl-c is caught,... eventually.

Is there anything I can do to adjust the frequency with which fcntl checks for these signals?

A: 

Hi there,

I'm definitely no expert on the matter, but my knowledge is that fcntl, as you also stated, won't work in your case. fcntl advisory locks only make sense within the same machine.

So forget me if this if off-topic. I used File::NFSLock to solve cache storms/dogpile/stampeding problem. There were multiple application servers reading and writing cache files on a NFS volume (not very good idea, but that was what we had start with).

I subclassed/wrapped File::NFSLock to modify its behavior. In particular I needed:

  • persistent locks, that don't go away when a File::NFSLock object goes out of scope. Using regular File::NFSLock, your lock will vanish when the object goes out of scope. This was not what I needed.
  • that actual lock files also contain the name of the machine that acquired the lock. The process id is clearly not enough to decide whether a process is terminated, so I can safely steal the lockfile. So I modified the code to write lockfiles as machine:pid instead of just pid.

This has worked wonderfully for a couple of years.

Until the volume of requests had a 10x increase. That is, last month I started to experience the first problems where a really busy cache file was being written to by two backends at the same time, leaving dead locks behind. This happened for me when we reached around 9-10M overall pageviews per day, just to give you an idea.

The final broken cache file looked like:

<!-- START OF CACHE FILE BY BACKEND b1 -->
... cache file contents ...
<!--   END OF CACHE FILE BY BACKEND b1 -->
... more cache file contents ... wtf ...
<!--   END OF CACHE FILE BY BACKEND b2 -->

This can only happen if two backends write to the same file at the same time... It's not yet clear if this problem is caused by File::NFSLock + our mods or some bug in the application.

In conclusion, if your app is not terribly busy and trafficked, then go for File::NFSLock, I think it's your best bet. You sure you still want to use NFS?

cosimo