views:

1329

answers:

4

Here's a scenario. You have a large amount of legacy scripts, all using a common library. Said scripts use the 'print' statement for diagnostic output. No changes are allowed to the scripts - they range far and wide, have their approvals, and have long since left the fruitful valleys of oversight and control.

Now a new need has arrived: logging must now be added to the library. This must be done automatically and transparently, without users of the standard library needing to change their scripts. Common library methods can simply have logging calls added to them; that's the easy part. The hard part lies in the fact that diagnostic output from these scripts were always displayed using the 'print' statement. This diagnostic output must be stored, but just as importantly, processed.

As an example of this processing, the library should only record the printed lines that contain the words 'warning', 'error', 'notice', or 'attention'. The below Extremely Trivial and Contrived Example Code (tm) would record some of said output:

sub CheckPrintOutput
{
    my @output = @_; # args passed to print eventually find their way here.
    foreach my $value (@output) {
         Log->log($value) if $value =~ /warning|error|notice|attention/i;
    }
}

(I'd like to avoid such issues as 'what should actually be logged', 'print shouldn't be used for diagnostics', 'perl sucks', or 'this example has the flaws x y and z'...this is greatly simplified for brevity and clarity. )

The basic problem comes down to capturing and processing data passed to print (or any perl builtin, along those lines of reasoning). Is it possible? Is there any way to do it cleanly? Are there any logging modules that have hooks to let you do it? Or is it something that should be avoided like the plague, and I should give up on ever capturing and processing the printed output?

Additional: This must run cross-platform - windows and *nix alike. The process of running the scripts must remain the same, as must the output from the script.

Additional additional: An interesting suggestion made in the comments of codelogic's answer:

You can subclass http://perldoc.perl.org/IO/Handle.html and create your own file handle which will do the logging work. – Kamil Kisiel

This might do it, with two caveats:

1) I'd need a way to export this functionality to anyone who uses the common library. It would have to apply automatically to STDOUT and probably STDERR too.

2) the IO::Handle documentation says that you can't subclass it, and my attempts so far have been fruitless. Is there anything special needed to make sublclassing IO::Handle work? The standard 'use base 'IO::Handle' and then overriding the new/print methods seem to do nothing.

Final edit: Looks like IO::Handle is a dead end, but Tie::Handle may do it. Thanks for all the suggestions; they're all really good. I'm going to give the Tie::Handle route a try. If it causes problems I'll be back!

Addendum: Note that after working with this a bit, I found that Tie::Handle will work, if you don't do anything tricky. If you use any of the features of IO::Handle with your tied STDOUT or STDERR, it's basically a crapshoot to get them working reliably - I could not find a way to get the autoflush method of IO::Handle to work on my tied handle. If I enabled autoflush before I tied the handle it would work. If that works for you, the Tie::Handle route may be acceptable.

+7  A: 

You can use Perl's select to redirect STDOUT.

open my $fh, ">log.txt";
print "test1\n";
my $current_fh = select $fh;
print "test2\n";
select $current_fh;
print "test3\n";

The file handle could be anything, even a pipe to another process that post processes your log messages.

PerlIO::tee in the PerlIO::Util module seems to allows you to 'tee' the output of a file handle to multiple destinations (e.g. log processor and STDOUT).

codelogic
select is all well and good, but we need to process that data, too. Is there a file handle type that gives hooks to evaluate and do something with the data passed to it?
Robert P
Also, we cannot modify the current behavior of these scripts, so the output must remain on STDOUT as well.
Robert P
You can open a filehandle as a pipe to another process. Do the logging stuff in there, then you can print to stdout from inside it.
Ant P.
You can have the piped process redirect its STDIN to STDOUT in addition to processing it, that way current behavior will not be affected. I'm not aware of any modules that abstract this functionality.
codelogic
Interesting idea... would that be with something like fork()? There's another requirement too...this must run across platforms: windows and *nix alike. As I understand it, fork doesn't really work in windows. Or are you talking about a different piping mechanism?
Robert P
A pipe is not a fork. It creates a new process and provides you with a handle to its STDIN. Read more about it @ http://www.troubleshooters.com/codecorn/littperl/perlfile.htm#Piping
codelogic
You can subclass http://perldoc.perl.org/IO/Handle.html and create your own file handle which will do the logging work.
Kamil Kisiel
re: codelogic-I meant would you create the pipe and then start the other process with fork. If not fork, what would you do to start the process? qx?re: Kamil: Interesting. This would then require that they include that subclass of IO::Handle, right? Could it be re-exported automatically?
Robert P
@Robert: No, you don't have to manually fork. Perl takes care of creating the other process and establishing the connection between the file handle and the other process' STDIN. It's non-blocking.
codelogic
A: 

You could run the script from a wrapper script that captures the original script's stdout and writes the output somewhere sensible.

Greg Hewgill
Unfortunately, this violates the 'script must run the same as before' requirement. The user must not have to do anything different to get this logging information.
Robert P
It doesn't violate anything. Nothing says that the thing that executes when you type 'perl' is the real perl.
brian d foy
True enough from a general implementation standpoint; a wrapper script is one possible solution. However, asking the users to run another script just to run their script not a solution that will work in this scenario - the command line, perl distro, and console output must remain the same.
Robert P
And it must ONLY affect scripts that use this particular library. If I changed what the shell thought 'perl' was, it would affect other scripts that have no need for the common library. That too, is also not an acceptable solution for this application.
Robert P
But depending on yours, it might be. :)
Robert P
+6  A: 

Lots of choices. Use select() to change the filehandle that print defaults to. Or tie STDOUT. Or reopen it. Or apply an IO layer to it.

ysth
+16  A: 

There are a number of built-ins that you can override (see perlsub). However, print is one of the built-ins that doesn't work this way. The difficulties of overriding print is detailed at this perlmonk's thread.

However, you can

  1. Create a package
  2. Tie a handle
  3. Select this handle.

Now, a couple of people have given the basic framework, but it works out kind of like this:

package IO::Override;
use base qw<Tie::Handle>;
use Symbol qw<geniosym>;

sub TIEHANDLE { return bless geniosym, __PACKAGE__ }

sub PRINT { 
    shift;
    # You can do pretty much anything you want here. 
    # And it's printing to what was STDOUT at the start.
    # 
    print $OLD_STDOUT join( '', 'NOTICE: ', @_ );
}

tie *PRINTOUT, 'IO::Override';
our $OLD_STDOUT = select( *PRINTOUT );

You can override printf in the same manner:

sub PRINTF { 
    shift;
    # You can do pretty much anything you want here. 
    # And it's printing to what was STDOUT at the start.
    # 
    my $format = shift;
    print $OLD_STDOUT join( '', 'NOTICE: ', sprintf( $format, @_ ));
}

See Tie::Handle for what all you can override of STDOUT's behavior.

Axeman
Oh, this is looking good. Does this change print's behavior if they use print to write to file handles, or just to STDOUT?
Robert P
Looking at it more. I see - it's tied to a single file handle at a time, right? I will be giving this a whirl shortly and report back.
Robert P
The "file handle" is just a means. Once we've got the flow directed to where we can specify behavior, we can write out to as many real handles as we please, filter it as we want--even make database calls, if that's what we wanted to do.
Axeman
Fantastic, this is exactly what I was hoping for : only affecting STDOUT (and probably STDERR too), transparent to the outside. Thanks!
Robert P
What about re-assigning the STDOUT glob instead of select()'ing it? We've stored it in the module...for all intents and purposes our new tied handle should be STDOUT. It seems kind-of out there, but not beyond workability. Do you know of any major problems with doing something like that?
Robert P
"We've stored it in the module" it == STDOUT.
Robert P