views:

1644

answers:

8

I have a filehandle FILE in Perl, and I want to iterate over all the lines in the file. Is there a difference between the following?

while (<FILE>) {
    # do something
}

and

foreach (<FILE>) {
    # do something
}
+17  A: 

In scalar context (i.e. while) <FILE> returns each line in turn.

In list context (i.e. foreach) <FILE> returns a list consisting of each line from the file.

You should use the while construct.

See perlop - I/O Operators for more.

Edit: j_random_hacker rightly says that

while (<FILE>) { … }

tramples on $_ while foreach does not (foreach localises $_ first). Surely this is the most important behavioural difference!

kmkaplan
-1 until you mention that while (<FILE>) {} tramples on $_ while foreach does not (foreach localises $_ first). Surely this is the *most* important behavioural difference!
j_random_hacker
Thank you! This unintuitive difference is the source of quite a few bugs.
j_random_hacker
+23  A: 

For most purposes, you probably won't notice a difference. However, foreach reads each line into a list (not an array) before going through it line by line, whereas while reads one line at a time. As foreach will use more memory and require processing time upfront, it is generally recommended to use while to iterate through lines of a file.

EDIT (via Schwern): The foreach loop is equivalent to this:

my @lines = <$fh>;
for my $line (@lines) {
    ...
}

It's unfortunate that Perl doesn't optimize this special case as it does with the range operator (1..10).

For example, if I read /usr/share/dict/words with a for loop and a while loop and have them sleep when they're done I can use ps to see how much memory the process is consuming. As a control I've included a program that opens the file but does nothing with it.

USER       PID %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
schwern  73019   0.0  1.6   625552  33688 s000  S     2:47PM   0:00.24 perl -wle open my $fh, shift; for(<$fh>) { 1 } print "Done";  sleep 999 /usr/share/dict/words
schwern  73018   0.0  0.1   601096   1236 s000  S     2:46PM   0:00.09 perl -wle open my $fh, shift; while(<$fh>) { 1 } print "Done";  sleep 999 /usr/share/dict/words
schwern  73081   0.0  0.1   601096   1168 s000  S     2:55PM   0:00.00 perl -wle open my $fh, shift; print "Done";  sleep 999 /usr/share/dict/words

The for program is consuming almost 32 megs of real memory (the RSS column) to store the contents of my 2.4 meg /usr/share/dict/words. The while loop only stores one line at a time consuming just 70k for line buffering.

Alex Reynolds
Nothing "fake" about the list. List is correct, array is wrong. There's no array there.
ysth
The distinction between arrays and lists is important. Conflating them will lead to errors in your understanding and eventually your code.
daotoad
In Perl6 the difference will be all but nonexistent. because of lazy lists.
Brad Gilbert
I think Brad means the difference between the foreach and the while, not the previous comments. :) Lists are data, and arrays contain lists.
brian d foy
I'm curious if this is still true. I ran a test using a largish file the other day and there seemed to be no difference between for and while. Could file handles be special cased to work around a common mistake? (From experience, I know there was a clear difference 10ish years ago.)
Jon Ericson
@Jon Ericson - I think the difference between now and 10 years ago is just processor speed/power.
Chris Lutz
@ysth - You are correct, and I have added an edit to clarify the distinction. Until now, I have always confused the two structures, although it seems I've never had to suffer consequences for my mistake. Thanks for the feedback.
Alex Reynolds
-1 until you mention that while (<FILE>) {} tramples on $_ while foreach does not (foreach localises $_ first). Surely this is the *most* important behavioural difference!
j_random_hacker
@Jon It is a memory issue, not a speed issue. If you just benchmark total runtime they will come out roughly the same, but memory consumption will be much larger for the for loop. See the expanded example.
Schwern
@j_random_hacker The memory difference is far more important and practical. You shouldn't be relying on $_ for any significant distance of code anyway exactly because lots of things trample on it.
Schwern
@Schwern: If you can't see why code that silently changes a commonly used global variable leads to a maintenance nightmare, I doubt you have worked on a big project. Memory usage is only important in the rare case of truly huge files; otherwise, the OS's VMM will ensure everything works (slowly).
j_random_hacker
+10  A: 

In addition to the previous responses, another benefit of using while is that you can use the $. variable. This is the current line number of the last filehandle accessed (see perldoc perlvar).

while ( my $line = <FILE> ) {
    if ( $line =~ /some_target/ ) {
        print "Found some_target at line $.\n";
    }
}
Ovid
Re "accessed", specifically, via: readline/glob (aka <>), eof, tell, sysseek.
ysth
Strictly speaking, you can also access the $. variable using a for loop; but since it expands the list entirely first, you always get the last line number.
brunov
+2  A: 

Update: j random hacker points out in a comment that Perl special cases the falseness test in a while loop when reading from a file handle. I've just verified that reading a false value will not terminate the loop -- at least on modern perls. Sorry for steering you all wrong. After 15 years of writing Perl I'm still a noob. ;)

Everyone above is right: use the while loop because it will be more memory efficient and give you more control.

A funny thing about that while loop though is that it exits when the read is false. Usually that will be end-of-file, but what if it returns an empty string or a 0? Oops! Your program just exited too soon. This can happen on any file handle if the last line in the file doesn't have a newline. It can also happen with custom file objects that have a read method that doesn't treat newlines the same way as regular Perl file objects.

Here's how to fix it. Check for an undefined value read which indicates end-of-file:

while (defined(my $line = <FILE>)) {
    print $line;
}

The foreach loop doesn't have this problem by the way and is correct even though inefficient.

Ken Fox
No! Perl special-cases the form "while (<FILE>) { ... }" to be exactly what you suggested as it's replacement: "while (defined($_ = <FILE>)) {}". So a line at the end of file containing only "0" and no LF character will *not* be ignored. See the section "I/O Operators" in perlop.
j_random_hacker
Sweet! When did this get fixed? There are still many examples in the pod that use the while defined syntax. IIRC Perl used to treat while(<>) differently than while(<FILE>).
Ken Fox
AFAIK it's been that way since Perl 5, but I don't know. And Perl's picky about what forms it will special-case: e.g. "while (<>)", "while ($_ = <FILE>)" and "while (my $x = <FILE>)" get special-cased, but "while ($_ = '' . <FILE>)" doesn't. (Test with a file ending with a "0" and no LF.)
j_random_hacker
Don't worry about feeling like a noob... I've been Perling since 1999 and a month ago learned that the range operator is special-cased for two constant scalars! (E.g. "1 .. 10") :) Yes it sucks that some of the POD docs are so out of date, also googling turns up some bad advice/explanations.
j_random_hacker
I dropped the -1 I gave you but you can still get another +1 from me if you mention that "while (<FILE>)" tramples on $_ while "foreach (<FILE>)" localises $_, avoiding tramplification. This non-obvious difference in behaviour causes quite a few subtle bugs.
j_random_hacker
A: 

*j_random_hacker* mentioned this in the comments to this answer, but didn't actually put it in an answer of its own, even though it's another difference worth mentioning.

The difference is that while (<FILE>) {} overwrites $_, while foreach(<FILE>) {} localizes it. That is:

$_ = 100;
while (<FILE>) {
    # $_ gets each line in turn
    # do something with the file
}
print $_; # yes I know that $_ is unneeded here, but 
          # I'm trying to write clear code for the example

Will print out the last line of <FILE>.

However,

$_ = 100;
foreach(<FILE>) {
    # $_ gets each line in turn
    # do something with the file
}
print $_;

Will print out 100. To get the same with a while(<FILE>) {} construct you'd need to do:

$_ = 100;
{
    local $_;
    while (<FILE>) {
        # $_ gets each line in turn
        # do something with the file
    }
}
print $_; # yes I know that $_ is unneeded here, but 
          # I'm trying to write clear code for the example

Now this will print 100.

Nathan Fellman
+2  A: 

I added an example dealing with this to the next edition of Effective Perl Programming.

With a while, you can stop processing FILE and still get the unprocessed lines:

 while( <FILE> ) {  # scalar context
      last if ...;
      }
 my $line = <FILE>; # still lines left

If you use a foreach, you consume all of the lines in the foreach even if you stop processing them:

 foreach( <FILE> ) { # list context
      last if ...;
      }
 my $line = <FILE>; # no lines left!
brian d foy
A: 

Here is an example where foreach will not work but while will do the job

while (<FILE>) {
   $line1 = $_;
   if ($line1 =~ /SOMETHING/) {
      $line2 = <FILE>;
      if (line2 =~ /SOMETHING ELSE/) {
         print "I found SOMETHING and SOMETHING ELSE in consecutive lines\n";
         exit();
      }
   }
}

You simply cannot do this with foreach because it will read the whole file into a list before entering the loop and you wont be able to read the next line inside the loop. I am sure there will be workarounds to this problem even in foreach (reading into an array comes to mind) but while definitely offers a very straight forward solution.

A second example is when you have to parse a large (say 3GB) file on your machine with only 2GB RAM. foreach will simply run out of memory and crash. I learnt this the hard way very early in my perl programming life.

AP
A: 

foreach loop is faster than while (which is conditional based).

Von Tech