views:

398

answers:

4

I'm writing scripts that will run in parallel and will get their input data from the same file. These scripts will open the input file, read the first line, store it for further treatment and finally erase this read line from the input file.

Now the problem is that multiple scripts accessing the file can lead to the situation where two scripts access the input file simultaneously and read the same line, which produces the unacceptable result of the line being processed twice.

Now one solution is to write a lock file (.lock_input) before accessing the input file, and then erase it when releasing the input file, but this solution is not appealing in my case because sometimes NFS slows down network communication randomly and may not have reliable locking.

Another solution is to put a process lock instead of writing a file, which means the first script to access the input file will launch a process called lock_input, and the other scripts will ps -elf | grep lock_input. If it is present on the process list they will wait. This may be faster than writing to the NFS but still not perfect solution ...

So my question is: Is there any bash command (or other script interpreter) or a service I can use that will behave like semaphore or mutex locks used for synchronization in thread programming?

Thank you.

Small rough example:

Let's say we have input_file as following:

Monday
Tuesday
Wednesday
Thursday
Friday
Saturday 
Sunday

Treatment script : TrScript.sh

#!/bin/bash  
NbLines=$(cat input_file | wc -l)  
while [ ! $NbLines = 0 ]  
do  
  FirstLine=$(head -1 input_file)  
  echo "Hello World today is $FirstLine"  
  RemainingLines=$(expr $NbLines - 1 )  
  tail -n $RemainingLines input_file > tmp  
  mv tmp input_file  
  NbLines=$(cat input_file | wc -l)   
done

Main script:

#! /bin/bash  
./TrScript.sh &  
./TrScript.sh &  
./TrScript.sh &  
wait

The result should be:

Hello World today is Monday  
Hello World today is Tuesday  
Hello World today is Wednesday  
Hello World today is Thursday  
Hello World today is Friday  
Hello World today is Saturday  
Hello World today is Sunday
A: 

I haven't used it myself, but it looks like sh-mutex could be what you're looking for.

Matt Ball
+3  A: 

See this and this for discussions on file locking in Bash.

Dennis Williamson
+2  A: 

I have always liked the lockfile program (sample search result for lockfile manpage) from the procmail set of tools (should be available on most systems, though it might not be installed by default).

It was designed to lock mail spool files, which are (were?) commonly mounted via NFS, so it does work properly over NFS (as much as anything can).

Also, as long as you you are making the assumption that all your ‘workers’ are on the same machine (by assuming you can check for PIDs, which may not work properly when PIDs eventually wrap), you could put your lock file in some other, local, directory (e.g. /tmp) while processing files hosted on an NFS server. As long as all the workers use the same lock file location (and a one-to-one mapping of lockfile filenames to locked pathnames), it will work fine.

Chris Johnsen
A: 

use

line=`flock $lockfile -c "(gawk 'NR==1' < $infile ; gawk 'NR>1' < $infile > $infile.tmp ; mv $infile.tmp $infile)"`

for accessing the file you want to read from. This uses file locks, though.

gawk NR==1 < ...

prints the first line of the input

me