views:

205

answers:

3

I usually loop through lines in a file using the following code:

open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
  ...
}

However, in answering another question, Evan Carroll edited my answer, changing my while statement to:

while ( defined( my $line = <$fh> ) ) {
  ...
}

His rationale was that if you have a line that's 0 (it'd have to be the last line, else it would have a carriage return) then your while would exit prematurely if you used my statement ($line would be set to "0", and the return value from the assignment would thus also be "0" which gets evaluated to false). If you check for defined-ness, then you don't run into this problem. It makes perfect sense.

So I tried it. I created a textfile whose last line is 0 with no carriage return on it. I ran it through my loop and the loop did not exit prematurely.

I then thought, "Aha, maybe the value isn't actually 0, maybe there's something else there that's screwing things up!" So I used Dump() from Devel::Peek and this is what it gave me:

SV = PV(0x635088) at 0x92f0e8
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  PV = 0X962600 "0"\0
  CUR = 1
  LEN = 80

That seems to tell me that the value is actually the string "0", as I get a similar result if I call Dump() on a scalar I've explicitly set to "0" (the only difference is in the LEN field -- from the file LEN is 80, whereas from the scalar LEN is 8).

So what's the deal? Why doesn't my while() loop exit prematurely if I pass it a line that's only "0" with no carriage return? Is Evan's loop actually more defensive, or does Perl do something crazy internally that means you don't need to worry about these things and while() actually only does exit when you hit eof?

+13  A: 

Because

 while (my $line = <$fh>) { ... }

actually compiles down to

 while (defined( my $line = <$fh> ) ) { ... }

It may have been necessary in a very old version of perl, but not any more! You can see this from running B::Deparse on your script:

>perl -MO=Deparse
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
  ...
}

^D
die "Could not open file $file for reading: $!\n" unless open my $fh, '<', $file;
while (defined(my $line = <$fh>)) {
    do {
        die 'Unimplemented'
    };
}
- syntax OK

So you're already good to go!

Robert P
PS, I love...absolutely LOVE how `...` is valid syntax in 5.12 and up. LOVE it.
Robert P
Oh my. I wonder if anyone ever gets bitten in the arse by that implicit `defined`?
zigdon
If anyone's really writing code that's checking if an un-chomp()ed line read from a file without a line ending evaluates to False that way, they're getting exactly what they deserve. Perl's DWIM attitude usually gets things right.
Wooble
+1 for `B::Deparse`
Daenyth
That... is awesome. I forgot about `B::Deparse` entirely. Thanks, Robert!
CanSpice
No -1, because I agree `B::Deparse` is useful for exploring what Perl is doing when you're stumped, but IMHO using it to "answer" a question like this is not... right. It only tells you what Perl will do in the handful of specific cases that you try it on, it doesn't tell you anything about the general conditions under which this behaviour will occur. Only the language spec can tell you that.
j_random_hacker
@j_random_hacker, since perl5 does not have a formal language spec, the behavior of the interpreter IS the definative answer
Eric Strom
@Eric: That's true (and a little sad), but ultimately not that important, since Perl does come with standard docs (perlsyn etc.) that do much the same job. I will always prefer an answer that says, "Section X in perlfoo describes when Perl does Y" to one that says, "I tried A, B and C with `B::Deparse` and found Perl does Y for them."
j_random_hacker
@j_random_hacker: If you ever check out the Perl 5 mailing list, when discussing Perl's behavior under some condition, the usual response to a language behavior question goes something "I looked at the test cases, checked the docs, then 'looked at the C to confirm that these were right.'" :)
Robert P
@Robert: Oh dear... Lots to like about Perl, but this is a huge weakness. Think about this: Why don't those people on the Perl 5 mailing list go any further after looking at the C source? Why don't they have to look at the C compiler's source code to figure out what the compiler will do with Perl's source, then look at CPU gate layouts to see what the CPU will do with the binary code? That's right, it's because C compilers and CPUs are designed to a spec! OK, I'm done whining now :)
j_random_hacker
@j_random_hacker: In perl's defense, perl6 puts an end to that. There's a spec and there are several independent implementations to compare with. The spec is the final word
Daenyth
@Daenyth: Thanks, that's really good news!
j_random_hacker
@j_random_hacker: I would argue that the C hackers already do that for them. :)
Robert P
-1 just because Perl does this in this case on your version of Perl, it is not true in all cases and all versions. See the test cases I posted that shows it is a potential issue.
drewk
Are you kidding? Your examples have nothing to do with this construct. Of course there's going to be different behavior when you don't use it in a while loop. No where in the Perl docs does it say it won't. Calling a function that returns a line is not the same as doing a readline in a loop nor would any competent programmer assume so. This has been Perl's behavior since at least 5.003 (see CPAN's archived perlop - the section is there) and probably much before. That's 1996. That's so old in Perl terms, it predates support on most OSes that exist today, including Windows.
Robert P
+12  A: 

BTW, this is covered in the I/O Operators section of perldoc perlop:

In scalar context, evaluating a filehandle in angle brackets yields the next line from that file (the newline, if any, included), or "undef" at end-of-file or on error. When $/ is set to "undef" (sometimes known as file-slurp mode) and the file is empty, it returns '' the first time, followed by "undef" subsequently.

Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a "while" statement (even if disguised as a "for(;;)" loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) The $_ variable is not implicitly localized. You'll have to put a "local $_;" before the loop if you want that to happen.

The following lines are equivalent:

while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while defined($_ = <STDIN>);
print while ($_ = <STDIN>);
print while <STDIN>;

This also behaves similarly, but avoids $_ :

while (my $line = <STDIN>) { print $line }

In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:

while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }

In other boolean contexts, "<filehandle>" without an explicit "defined" test or comparison elicit a warning if the "use warnings" pragma or the -w command-line switch (the $^W variable) is in effect.

Ether
Good answer so I deleted mine. But the Perl docs are misleading -- they say, "If **and only if** the input symbol is the only thing inside the conditional of a while statement" -- but later contradict the "and only if" part by showing that `while (my $line = <STDIN>)` also behaves the same way. Leaving us wondering about exactly which circumstances this DWIMmery will be performed in.
j_random_hacker
@j_random: Isn't the "if and only if" part referring to whether $_ is used as the location for the line read from the handle, not whether the `defined` logic is employed?
Ether
@Ether: You're absolutely right, poor reading comprehension on my part. My apologies. I still think it wouldn't hurt to be explicit about exactly when `define` is auto-applied. My guess is: if the loop conditional test is `<SOMETHING>` or a scalar assignment with `<SOMETHING>` on the RHS -- is that everything though?
j_random_hacker
@j_random: yes, I think that would be correct.
Ether
A: 

While it is correct that the form of while (my $line=<$fh>) { ... } gets compiled to while (defined( my $line = <$fh> ) ) { ... } consider there are a variety of times when a legitimate read of the value "0" is misinterpreted if you do not have an explicit defined in the loop or testing the return of <>.

Here are several examples:

#!/usr/bin/perl
use strict; use warnings;

my $str = join "", map { "$_\n" } -10..10;
$str.="0";
my $sep='=' x 10;
my ($fh, $line);

open $fh, '<', \$str or 
     die "could not open in-memory file: $!";

print "$sep Should print:\n$str\n$sep\n";     

#Failure 1:
print 'while ($line=chomp_ln()) { print "$line\n"; }:',
      "\n";
while ($line=chomp_ln()) { print "$line\n"; } #fails on "0"
rewind();
print "$sep\n";

#Failure 2:
print 'while ($line=trim_ln()) { print "$line\n"; }',"\n";
while ($line=trim_ln()) { print "$line\n"; } #fails on "0"
print "$sep\n";
last_char();

#Failure 3:
# fails on last line of "0" 
print 'if(my $l=<$fh>) { print "$l\n" }', "\n";
if(my $l=<$fh>) { print "$l\n" } 
print "$sep\n";
last_char();

#Failure 4 and no Perl warning:
print 'print "$_\n" if <$fh>;',"\n";
print "$_\n" if <$fh>; #fails to print;
print "$sep\n";
last_char();

#Failure 5
# fails on last line of "0" with no Perl warning
print 'if($line=<$fh>) { print $line; }', "\n";
if($line=<$fh>) { 
    print $line; 
} else {
    print "READ ERROR: That was supposed to be the last line!\n";
}    
print "BUT, line read really was: \"$line\"", "\n\n";

sub chomp_ln {
# if I have "warnings", Perl says:
#    Value of <HANDLE> construct can be "0"; test with defined() 
    if($line=<$fh>) {
        chomp $line ;
        return $line;
    }
    return undef;
}

sub trim_ln {
# if I have "warnings", Perl says:
#    Value of <HANDLE> construct can be "0"; test with defined() 
    if (my $line=<$fh>) {
        $line =~ s/^\s+//;
        $line =~ s/\s+$//;
        return $line;
    }
    return undef;

}

sub rewind {
    seek ($fh, 0, 0) or 
        die "Cannot seek on in-memory file: $!";
}

sub last_char {
    seek($fh, -1, 2) or
       die "Cannot seek on in-memory file: $!";
}

I am not saying these are good forms of Perl! I am saying that they are possible; especially Failure 3,4 and 5. Note the failure with no Perl warning on number 4 and 5. The first two have their own issues...

drewk