I have a filehandle FILE
in Perl, and I want to iterate over all the lines in the file. Is there a difference between the following?
while (<FILE>) {
# do something
}
and
foreach (<FILE>) {
# do something
}
I have a filehandle FILE
in Perl, and I want to iterate over all the lines in the file. Is there a difference between the following?
while (<FILE>) {
# do something
}
and
foreach (<FILE>) {
# do something
}
In scalar context (i.e. while
) <FILE>
returns each line in turn.
In list context (i.e. foreach
) <FILE>
returns a list consisting of each line from the file.
You should use the while
construct.
See perlop - I/O Operators for more.
Edit: j_random_hacker rightly says that
while (<FILE>) { … }
tramples on
$_
while foreach does not (foreach localises$_
first). Surely this is the most important behavioural difference!
For most purposes, you probably won't notice a difference. However, foreach
reads each line into a list (not an array) before going through it line by line, whereas while
reads one line at a time. As foreach
will use more memory and require processing time upfront, it is generally recommended to use while
to iterate through lines of a file.
EDIT (via Schwern): The foreach
loop is equivalent to this:
my @lines = <$fh>;
for my $line (@lines) {
...
}
It's unfortunate that Perl doesn't optimize this special case as it does with the range operator (1..10
).
For example, if I read /usr/share/dict/words with a for
loop and a while
loop and have them sleep when they're done I can use ps
to see how much memory the process is consuming. As a control I've included a program that opens the file but does nothing with it.
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
schwern 73019 0.0 1.6 625552 33688 s000 S 2:47PM 0:00.24 perl -wle open my $fh, shift; for(<$fh>) { 1 } print "Done"; sleep 999 /usr/share/dict/words
schwern 73018 0.0 0.1 601096 1236 s000 S 2:46PM 0:00.09 perl -wle open my $fh, shift; while(<$fh>) { 1 } print "Done"; sleep 999 /usr/share/dict/words
schwern 73081 0.0 0.1 601096 1168 s000 S 2:55PM 0:00.00 perl -wle open my $fh, shift; print "Done"; sleep 999 /usr/share/dict/words
The for
program is consuming almost 32 megs of real memory (the RSS
column) to store the contents of my 2.4 meg /usr/share/dict/words. The while
loop only stores one line at a time consuming just 70k for line buffering.
In addition to the previous responses, another benefit of using while
is that you can use the $.
variable. This is the current line number of the last filehandle accessed (see perldoc perlvar
).
while ( my $line = <FILE> ) {
if ( $line =~ /some_target/ ) {
print "Found some_target at line $.\n";
}
}
Update: j random hacker points out in a comment that Perl special cases the falseness test in a while loop when reading from a file handle. I've just verified that reading a false value will not terminate the loop -- at least on modern perls. Sorry for steering you all wrong. After 15 years of writing Perl I'm still a noob. ;)
Everyone above is right: use the while
loop because it will be more memory efficient and give you more control.
A funny thing about that while
loop though is that it exits when the read is false. Usually that will be end-of-file, but what if it returns an empty string or a 0? Oops! Your program just exited too soon. This can happen on any file handle if the last line in the file doesn't have a newline. It can also happen with custom file objects that have a read method that doesn't treat newlines the same way as regular Perl file objects.
Here's how to fix it. Check for an undefined value read which indicates end-of-file:
while (defined(my $line = <FILE>)) {
print $line;
}
The foreach
loop doesn't have this problem by the way and is correct even though inefficient.
*j_random_hacker* mentioned this in the comments to this answer, but didn't actually put it in an answer of its own, even though it's another difference worth mentioning.
The difference is that while (<FILE>) {}
overwrites $_
, while foreach(<FILE>) {}
localizes it. That is:
$_ = 100;
while (<FILE>) {
# $_ gets each line in turn
# do something with the file
}
print $_; # yes I know that $_ is unneeded here, but
# I'm trying to write clear code for the example
Will print out the last line of <FILE>
.
However,
$_ = 100;
foreach(<FILE>) {
# $_ gets each line in turn
# do something with the file
}
print $_;
Will print out 100
. To get the same with a while(<FILE>) {}
construct you'd need to do:
$_ = 100;
{
local $_;
while (<FILE>) {
# $_ gets each line in turn
# do something with the file
}
}
print $_; # yes I know that $_ is unneeded here, but
# I'm trying to write clear code for the example
Now this will print 100
.
I added an example dealing with this to the next edition of Effective Perl Programming.
With a while
, you can stop processing FILE
and still get the unprocessed lines:
while( <FILE> ) { # scalar context
last if ...;
}
my $line = <FILE>; # still lines left
If you use a foreach
, you consume all of the lines in the foreach
even if you stop processing them:
foreach( <FILE> ) { # list context
last if ...;
}
my $line = <FILE>; # no lines left!
Here is an example where foreach
will not work but while
will do the job
while (<FILE>) {
$line1 = $_;
if ($line1 =~ /SOMETHING/) {
$line2 = <FILE>;
if (line2 =~ /SOMETHING ELSE/) {
print "I found SOMETHING and SOMETHING ELSE in consecutive lines\n";
exit();
}
}
}
You simply cannot do this with foreach
because it will read the whole file into a list before entering the loop and you wont be able to read the next line inside the loop. I am sure there will be workarounds to this problem even in foreach (reading into an array comes to mind) but while definitely offers a very straight forward solution.
A second example is when you have to parse a large (say 3GB) file on your machine with only 2GB RAM. foreach
will simply run out of memory and crash. I learnt this the hard way very early in my perl programming life.