ansaurus

Question

What's the most defensive way to loop through lines in a file with Perl?

Answer 1

+13 A:

Because

 while (my $line = <$fh>) { ... }

actually compiles down to

 while (defined( my $line = <$fh> ) ) { ... }

It may have been necessary in a very old version of perl, but not any more! You can see this from running B::Deparse on your script:

>perl -MO=Deparse
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
  ...
}

^D
die "Could not open file $file for reading: $!\n" unless open my $fh, '<', $file;
while (defined(my $line = <$fh>)) {
    do {
        die 'Unimplemented'
    };
}
- syntax OK

So you're already good to go!

Robert P 2010-09-22 22:02:11

PS, I love...absolutely LOVE how `...` is valid syntax in 5.12 and up. LOVE it.

Robert P 2010-09-22 22:02:52

Oh my. I wonder if anyone ever gets bitten in the arse by that implicit `defined`?

zigdon 2010-09-22 22:04:18

If anyone's really writing code that's checking if an un-chomp()ed line read from a file without a line ending evaluates to False that way, they're getting exactly what they deserve. Perl's DWIM attitude usually gets things right.

Wooble 2010-09-22 22:07:39

+1 for `B::Deparse`

Daenyth 2010-09-22 22:10:50

That... is awesome. I forgot about `B::Deparse` entirely. Thanks, Robert!

CanSpice 2010-09-22 22:22:21

No -1, because I agree `B::Deparse` is useful for exploring what Perl is doing when you're stumped, but IMHO using it to "answer" a question like this is not... right. It only tells you what Perl will do in the handful of specific cases that you try it on, it doesn't tell you anything about the general conditions under which this behaviour will occur. Only the language spec can tell you that.

j_random_hacker 2010-09-22 23:02:35

@j_random_hacker, since perl5 does not have a formal language spec, the behavior of the interpreter IS the definative answer

Eric Strom 2010-09-23 00:36:59

@Eric: That's true (and a little sad), but ultimately not that important, since Perl does come with standard docs (perlsyn etc.) that do much the same job. I will always prefer an answer that says, "Section X in perlfoo describes when Perl does Y" to one that says, "I tried A, B and C with `B::Deparse` and found Perl does Y for them."

j_random_hacker 2010-09-23 01:27:08

@j_random_hacker: If you ever check out the Perl 5 mailing list, when discussing Perl's behavior under some condition, the usual response to a language behavior question goes something "I looked at the test cases, checked the docs, then 'looked at the C to confirm that these were right.'" :)

Robert P 2010-09-23 01:30:47

@Robert: Oh dear... Lots to like about Perl, but this is a huge weakness. Think about this: Why don't those people on the Perl 5 mailing list go any further after looking at the C source? Why don't they have to look at the C compiler's source code to figure out what the compiler will do with Perl's source, then look at CPU gate layouts to see what the CPU will do with the binary code? That's right, it's because C compilers and CPUs are designed to a spec! OK, I'm done whining now :)

j_random_hacker 2010-09-23 01:51:18

@j_random_hacker: In perl's defense, perl6 puts an end to that. There's a spec and there are several independent implementations to compare with. The spec is the final word

Daenyth 2010-09-23 02:38:08

@Daenyth: Thanks, that's really good news!

j_random_hacker 2010-09-23 04:53:40

@j_random_hacker: I would argue that the C hackers already do that for them. :)

Robert P 2010-09-23 16:09:04

-1 just because Perl does this in this case on your version of Perl, it is not true in all cases and all versions. See the test cases I posted that shows it is a potential issue.

drewk 2010-09-23 17:21:13

Are you kidding? Your examples have nothing to do with this construct. Of course there's going to be different behavior when you don't use it in a while loop. No where in the Perl docs does it say it won't. Calling a function that returns a line is not the same as doing a readline in a loop nor would any competent programmer assume so. This has been Perl's behavior since at least 5.003 (see CPAN's archived perlop - the section is there) and probably much before. That's 1996. That's so old in Perl terms, it predates support on most OSes that exist today, including Windows.

Robert P 2010-09-23 19:47:21

Answer 2

+12 A:

BTW, this is covered in the I/O Operators section of perldoc perlop:

In scalar context, evaluating a filehandle in angle brackets yields the next line from that file (the newline, if any, included), or "undef" at end-of-file or on error. When $/ is set to "undef" (sometimes known as file-slurp mode) and the file is empty, it returns '' the first time, followed by "undef" subsequently.

Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a "while" statement (even if disguised as a "for(;;)" loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) The $_ variable is not implicitly localized. You'll have to put a "local $_;" before the loop if you want that to happen.

The following lines are equivalent:
while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while defined($_ = <STDIN>);
print while ($_ = <STDIN>);
print while <STDIN>;
This also behaves similarly, but avoids $_ :
while (my $line = <STDIN>) { print $line }
In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:
while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }
In other boolean contexts, "<filehandle>" without an explicit "defined" test or comparison elicit a warning if the "use warnings" pragma or the -w command-line switch (the $^W variable) is in effect.

Ether 2010-09-22 22:17:59

Good answer so I deleted mine. But the Perl docs are misleading -- they say, "If **and only if** the input symbol is the only thing inside the conditional of a while statement" -- but later contradict the "and only if" part by showing that `while (my $line = <STDIN>)` also behaves the same way. Leaving us wondering about exactly which circumstances this DWIMmery will be performed in.

j_random_hacker 2010-09-22 23:05:17

@j_random: Isn't the "if and only if" part referring to whether $_ is used as the location for the line read from the handle, not whether the `defined` logic is employed?

Ether 2010-09-22 23:15:25

@Ether: You're absolutely right, poor reading comprehension on my part. My apologies. I still think it wouldn't hurt to be explicit about exactly when `define` is auto-applied. My guess is: if the loop conditional test is `<SOMETHING>` or a scalar assignment with `<SOMETHING>` on the RHS -- is that everything though?

j_random_hacker 2010-09-23 01:37:47

@j_random: yes, I think that would be correct.

Ether 2010-09-23 02:14:56

Answer 3

A:

While it is correct that the form of while (my $line=<$fh>) { ... } gets compiled to while (defined( my $line = <$fh> ) ) { ... } consider there are a variety of times when a legitimate read of the value "0" is misinterpreted if you do not have an explicit defined in the loop or testing the return of <>.

Here are several examples:

#!/usr/bin/perl
use strict; use warnings;

my $str = join "", map { "$_\n" } -10..10;
$str.="0";
my $sep='=' x 10;
my ($fh, $line);

open $fh, '<', \$str or 
     die "could not open in-memory file: $!";

print "$sep Should print:\n$str\n$sep\n";     

#Failure 1:
print 'while ($line=chomp_ln()) { print "$line\n"; }:',
      "\n";
while ($line=chomp_ln()) { print "$line\n"; } #fails on "0"
rewind();
print "$sep\n";

#Failure 2:
print 'while ($line=trim_ln()) { print "$line\n"; }',"\n";
while ($line=trim_ln()) { print "$line\n"; } #fails on "0"
print "$sep\n";
last_char();

#Failure 3:
# fails on last line of "0" 
print 'if(my $l=<$fh>) { print "$l\n" }', "\n";
if(my $l=<$fh>) { print "$l\n" } 
print "$sep\n";
last_char();

#Failure 4 and no Perl warning:
print 'print "$_\n" if <$fh>;',"\n";
print "$_\n" if <$fh>; #fails to print;
print "$sep\n";
last_char();

#Failure 5
# fails on last line of "0" with no Perl warning
print 'if($line=<$fh>) { print $line; }', "\n";
if($line=<$fh>) { 
    print $line; 
} else {
    print "READ ERROR: That was supposed to be the last line!\n";
}    
print "BUT, line read really was: \"$line\"", "\n\n";

sub chomp_ln {
# if I have "warnings", Perl says:
#    Value of <HANDLE> construct can be "0"; test with defined() 
    if($line=<$fh>) {
        chomp $line ;
        return $line;
    }
    return undef;
}

sub trim_ln {
# if I have "warnings", Perl says:
#    Value of <HANDLE> construct can be "0"; test with defined() 
    if (my $line=<$fh>) {
        $line =~ s/^\s+//;
        $line =~ s/\s+$//;
        return $line;
    }
    return undef;

}

sub rewind {
    seek ($fh, 0, 0) or 
        die "Cannot seek on in-memory file: $!";
}

sub last_char {
    seek($fh, -1, 2) or
       die "Cannot seek on in-memory file: $!";
}

I am not saying these are good forms of Perl! I am saying that they are possible; especially Failure 3,4 and 5. Note the failure with no Perl warning on number 4 and 5. The first two have their own issues...

drewk 2010-09-23 02:17:06

ansaurus

tags:

views:

answers:

What's the most defensive way to loop through lines in a file with Perl?

related questions