views:

167

answers:

4

I ran across a very strange line of code in a legacy Perl application. The code here is part of a homegrown RSS reader that does some caching to prevent being blacklisted.

open(CAT, "/usr/bin/cat -v /tmp/cat-cache 2>&1|");

Does it seem likely that the original author ran the results through cat -v to strip out non-printing characters to deal with any number of character sets? Wouldn't this make more sense using a regular expression in Perl itself? Also, I am most perplexed by the pipe on the end.

+11  A: 

It looks like "cat -v" displays all the non-printing characters in the file, so you can physically see CRLFs, TABs etc.

The pipe is how Perl identifies to the open command that this is not a simple file, it is opening the piped output from that command.

Ron

Ron Savage
'cat -v' leaves tab and line feed alone - it transliterates the other non-printable characters.
Jonathan Leffler
+2  A: 

You may wish to look at the perl open tutorial.

Basically, a pipe at the end of a "filename" passed to open causes the program named as the file to be executed, and the output fed to perl. Similarly, you can use a pipe at the start of the "filename" to pipe output out to an external program.

It might make more sense to do this inside the perl program itself, but the quoted code is more compliant with two of the three prime virtues of a Perl programmer.

Daniel Martin
A: 

The original author was confused at some point or had a setuid cat :)

Some confusion on what "cat -v" does ...

   -v, --show-nonprinting
              use ^ and M- notation, except for LFD and TAB

... its not like "strings" (strings - print the strings of printable characters in files) ...

$ strings /bin/rm | head
/lib/ld-linux.so.2
e="?
__gmon_start__
libc.so.6
_IO_stdin_used
fflush
setlocale
mbrtowc
strncmp
optind
hpavc
+3  A: 

Functionally

that code would do something similar to this:

open my $fh, '<' , '/tmp/cat-cache' or Carp::croak("Cant open file $@ $! ");

sub lessquote {
    my $x  = shift;
    my $meta = shift; # meta means were repeating thise code for >128
    # Special Case for whitespace 
    if(( not defined $meta ) && ( $x  == 9 or $x == 10 ) ){
        return chr($x);
    }
    # Null and M-^@
    if(  $x  == 0 ){ 
        return "^@"; 
    }
    # ^A to ^Z as well as M-^A to M-^Z
    if( ( 0 <= $x ) && ( $x  <= 31 )){
        return "^" . chr( $x + ord('A') - 1 );
    }
    # Also M-^?
    if( $x == 127 ){ 
        return "^?";
    } 
    # Does the M- Family
    if( $x >= 128 && $x <= 255 ){ 
        return "M-" . lessquote( $x - 128 , 1); 
    }
    return chr( $x );
}

while( my $line = <$fh> ){
   $line =~ s{(.)}{ lessquote( ord( $1 ) ) }eg;
}

Not identical, but similar.

NB: lessquote appears to match my 'cat -v' output.

But as you can see, doing the same thing is a bit less than trivial and not directly suited for a regular expression, but still, I don't see why they shelled out to 'cat'.

As far as their style goes

They are shelling out in a bad way, the code style is so 1990's and it should be avoided.

open my $fh , '-|' , 'cat' , '-v' , '/tmp/cat-cache' or Carp::croak("Cant open file $@ $! ");

Syntax:

open my $FILEHANDLE , $OPENMODE, $FILENAME            || Carp::croak($ERRORMESSAGE)
open my $FILEHANDLE , $OPENMODE, $SHELLCOMMAND        || Carp::croak($ERRORMESSAGE) 
open my $FILEHANDLE , $OPENMODE, $SHELLPROGRAM, @ARGS || Carp::croak($ERRORMESSAGE)

Is the "preferred" notation these days for a multitude of reasons. Of course, you wouldn't ACTUALLY want to use cat, but I've left it in here for a clear example.

Kent Fredric
Personally, I'd prefer `open my $fh, qw(-| cat -v /tmp/cat-cache) or croak(...)` for the sake of less typing and better readability, but it works out to exactly the same thing.
ephemient