Hi there,
I'm definitely no expert on the matter, but my knowledge is that fcntl
, as you also stated, won't work in your case. fcntl advisory locks only make sense within the same machine.
So forget me if this if off-topic. I used File::NFSLock to solve cache storms/dogpile/stampeding problem. There were multiple application servers reading and writing cache files on a NFS volume (not very good idea, but that was what we had start with).
I subclassed/wrapped File::NFSLock to modify its behavior. In particular I needed:
- persistent locks, that don't go away when a File::NFSLock object goes out of scope. Using regular File::NFSLock, your lock will vanish when the object goes out of scope. This was not what I needed.
- that actual lock files also contain the name of the machine that acquired the lock. The process id is clearly not enough to decide whether a process is terminated, so I can safely steal the lockfile. So I modified the code to write lockfiles as
machine:pid
instead of just pid
.
This has worked wonderfully for a couple of years.
Until the volume of requests had a 10x increase. That is, last month I started to experience the first problems where a really busy cache file was being written to by two backends at the same time, leaving dead locks behind. This happened for me when we reached around 9-10M overall pageviews per day, just to give you an idea.
The final broken cache file looked like:
<!-- START OF CACHE FILE BY BACKEND b1 -->
... cache file contents ...
<!-- END OF CACHE FILE BY BACKEND b1 -->
... more cache file contents ... wtf ...
<!-- END OF CACHE FILE BY BACKEND b2 -->
This can only happen if two backends write to the same file at the same time... It's not yet clear if this problem is caused by File::NFSLock + our mods or some bug in the application.
In conclusion, if your app is not terribly busy and trafficked, then go for File::NFSLock, I think it's your best bet. You sure you still want to use NFS?