tags:

views:

93

answers:

3

I have a Perl script that executes a long running process and observes its command line output (log messages), some of which are multiple lines long. Once it has a full log message, it sends it off to be processed and grabs the next log message.

open(PS_F, "run.bat |") or die $!;

$logMessage = "";

while (<PS_F>) {
    $lineRead = $_;

    if ($lineRead =~ m!(\d{4}-\d{2}-\d{2}\ \d{2}:\d{2}:\d{2})!) { 
     #process the previous log message

     $logMessage = $lineRead;
    }
    else {
     $logMessage = $logMessage.$_;
    }
}

close(PS_F);

In its current form, do I have to worry about the line reading and processing "backing up"? For example, if I get a new log message every 1 second and it takes 5 seconds to do all the processing (random numbers I pulled out), do I have to worry that I will miss log messages or have memory issues?

+7  A: 

In general, data output on the pipeline by one application will be buffered if the next cannot consume it fast enough. If the buffer fills up, the outputting application is blocked (i.e. calls to write to the output file handle just stall) until the consumer catches up. I believe the buffer on Linux is (or was) 65536 bytes.

In this fashion, you can never run out of memory, but you can seriously stall the producer application in the pipeline.

Adam Wright
+4  A: 

No you will not lose messages. The writing end of the pipe will block if the pipe buffer is full.

nos
+3  A: 

Strictly speaking, this should be a comment: Please consider re-writing your code as

# use lexical filehandle and 3 arg open

open my $PS_F, '-|', 'run.bat'
    or die "Cannot open pipe to 'run.bat': $!";

# explicit initialization not needed
# limit scope

my $logMessage;

while (<$PS_F>) { 
    # you probably meant to anchor the pattern
    # and no need to capture if you are not going to use
    # captured matches
    # there is no need to escape a space, although you might
    # want to use [ ] for clarity

    $logMessage = '' if m!^\d{4}-\d{2}-\d{2}[ ]\d{2}:\d{2}:\d{2}!;
    $logMessage .= $_;
}

close $PS_F
    or die "Cannot close pipe: $!";
Sinan Ünür
Instead of clobbering `$_` needlessly, why not just use it? `while(<$PS_F>) { if(/(\d{4}-\d{2}-\d{2}\ \d{2}:\d{2}:\d{2})/) { $logMessage = $_; } else { $logMessage .= $_ } }`
Chris Lutz
And also, why is the regex capturing the match into `$1` if we're not going to use it? The parenthesis in the OP's regex are unnecessary, unless he/she is using `$1` in the code elsewhere.
Chris Lutz
Thank you all for your suggestions on writing more idiomatic Perl. I'm still learning the language any any pointers would be appreciated.
Thomas Owens
I really like this use of a CW response to provide suggestions for improvements that are somewhat orthogonal to the original question asked. I'd like to see us use this more!
Ether