views:

465

answers:

6

I have a need to upgrade a Perl CGI script where the users must complete 3 steps. After they finish each step, the script is logging which step the user completed. Having a record of this is important so we can prove to the user that they only finished step one and didn't complete all three steps, for example.

Right now, the script is creating 1 log file for each instance of the CGI script. So if UserA does step 1, then UserB does step 1, then step 2, then step 3 - and then UserA finishes step 2 and step 3, the order of the log files would be.

LogFile.UserA.Step1
LogFile.UserB.Step1
LogFile.UserB.Step2
LogFile.UserB.Step3
LogFile.UserA.Step2
LogFile.UserA.Step3

The log files are named with the current timestamp, a random number, and the process PID.

This works fine to prevent the same file from getting written to more than once, but the directory quickly gets thousands of files (each file contains just a few bytes in it). There is a process to rotate and compress these logs, but it has fallen upon me to make it so the script logs to just one file a day to reduce the number of log files being created.

Basically, the log file will have the current date in the file name, and anytime the CGI script needs to write to the log, it will append to the one log file for that day, regardless of the user or what step they are on.

Nothing will need to be reading the log file - the only thing that will happen to it is an append by the CGI script. The log rotation will run on log files that are 7 days or older.

My question is, what is the best way to handle the concurrent appends to this log file? Do I need to lock it before appending? I found this page on Perl Monks that seems to indicate that "when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be overwritten."

I've learned that just because it can be done doesn't mean that I should, but in this case, what is the safest, best practice way to do this?

Summary:

  • Concurrent appends to the same file
  • Each append to the file is just one line, less than 50 characters
  • Order does not matter

Thanks!

+2  A: 

"when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be overwritten" may be true, but that doesn't mean your data can't come out mangled (one entry inside another). It's not very likely to happen for small amounts of data, but it might.

flock is a reliable and reasonably simple solution to that problem. I would advice you to simply use that.

Leon Timmermans
A: 

You can try to play with file locking, but this will lead you to the land of hurt very quickly. The easier way would be to have a small persistent process or a cron job that will scan your log file directory and append the events to the log file one a time.

For extra safety you can make your logging scripts create new log file every period of time (say 5 minutes) , and make your daemon ignore the files that are younger than five minutes.

Vlad
A: 

I think that I would run a separate process, e.g. using Net::Daemon or similar, which handles the writing of log entries in a central manner. The CGI script instances would pass the log strings to this daemon through a socket.

Svante
A: 

You have a few options, in growing order of complexity:

1) Just time-and-datestamp every line. When you need to examine the coalesced file, you interleave all the input files.

2) Write a script that keeps running all the time that holds all filehandles open and, using select() finds files with new data and dumps it to the output in the order it received it. This method could become a resource hog, as it would be constantly calling select, then looking for new files, then opening the new files, then calling select again.

3) Write a script that accepts TCP connections. If you ever end up in a situation where the loggers can have more log files open than a process in your operating system can support at a time, you're back to solution number 1. Honestly, go for number 1.

+4  A: 

Yes, use flock.

An example program is below, beginning with typical front matter:

#! /usr/bin/perl

use warnings;
use strict;

use Fcntl qw/ :flock /;

Then we specify the path to the log and the number of clients that will run:

my $log = "/tmp/my.log";
my $clients = 10;

To log a message, open the file in append mode so all writes automatically go at the end. Then call flock to wait our turn on having exclusive access to the log. Once we're up, write the message and close the handle, which automatically releases the lock.

sub log_step {
  my($msg) = @_;

  open my $fh, ">>", $log or die  "$0 [$$]: open: $!";
  flock $fh, LOCK_EX      or die  "$0 [$$]: flock: $!";
  print $fh "$msg\n"      or die  "$0 [$$]: write: $!";
  close $fh               or warn "$0 [$$]: close: $!";
}

Now fork off $clients child processes to go through all three steps with random intervals between:

my %kids;
my $id = "A";
for (1 .. $clients) {
  my $pid = fork;
  die "$0: fork: $!" unless defined $pid;

  if ($pid) {
    ++$kids{$pid};
    print "$0: forked $pid\n";
  }
  else {
    my $user = "User" . $id;
    log_step "$user: Step 1";
    sleep rand 3;
    log_step "$user: Step 2";
    sleep rand 3;
    log_step "$user: Step 3";
    exit 0;
  }

  ++$id;
}

Don't forget to wait on all the children to exit:

print "$0: reaping children...\n";
while (keys %kids) {
  my $pid = waitpid -1, 0;
  last if $pid == -1;

  warn "$0: unexpected kid $pid" unless $kids{$pid};
  delete $kids{$pid};
}

warn "$0: still running: ", join(", " => keys %kids), "\n"
  if keys %kids;

print "$0: done!\n", `cat $log`;

Sample output:

[...]
./prog.pl: reaping children...
./prog.pl: done!
UserA: Step 1
UserB: Step 1
UserC: Step 1
UserC: Step 2
UserC: Step 3
UserD: Step 1
UserE: Step 1
UserF: Step 1
UserG: Step 1
UserH: Step 1
UserI: Step 1
UserJ: Step 1
UserD: Step 2
UserD: Step 3
UserF: Step 2
UserG: Step 2
UserH: Step 2
UserI: Step 2
UserI: Step 3
UserB: Step 2
UserA: Step 2
UserA: Step 3
UserE: Step 2
UserF: Step 3
UserG: Step 3
UserJ: Step 2
UserJ: Step 3
UserE: Step 3
UserH: Step 3
UserB: Step 3

Keep in mind that the order will be different from run to run.

Greg Bacon
gbacon does this right, but something important to remember when adapting his code: you *don't* unlock (`LOCK_UN`) the file -- you close it. That will make sure the data is flushed and *then* unlock it.
hobbs
Thanks gbacon. Order is not important, so that is not an issue. I'm not entirely sure if I need to fork in my case. Since this is a CGI script (not fast CGI - it doesn't stay alive), the user will only be able to perform 1 step during the life of the script - once he completes a step, the script will exit. Then, on the web he is working on step 2, hits submit, step 2 will be logged, and the script exits.
BrianH
@BrianH I was unclear: the forked children simulate multiple concurrent clients. In your CGI program, call a sub similar to `log_step` from my answer in order to log completion of a step by a real user.
Greg Bacon
@hobbs Yes, unlocking with `close` is best, but Perl also tries to help by automatically flushing before locking or unlocking. This behavior has been present since 5.004: http://perldoc.perl.org/perl5004delta.html#flock
Greg Bacon
@gbacon - Oh, okay - I understand what you were doing now. Well thanks for going the extra mile to demonstrate how the concurrent logging worked! :)
BrianH
@BrianH You're welcome! I'm glad it helps.
Greg Bacon
+1  A: 

I would urge Log::Log4Perl

hpavc