ansaurus

Question

Perl - the fastest way to read a range of lines from a file into a variable

Answer 1

A:

# cat x.pl
  #!/usr/bin/perl
  my @lines;
  my $start = 2;
  my $end = 4;
  my $i = 0;
  for( $i=0; $i<$start; $i++ )
  {   
    scalar(<STDIN>);
  }   
  for( ; $i<=$end; $i++ )
  {   
    push @lines, scalar(<STDIN>);
  }   
  print @lines;
# cat xxx 
  1   
  2   
  3   
  4   
  5   
# cat xxx | ./x.pl
  3   
  4   
  5   
#

Otherwise, you're reading a lot of extra lines at the end you don't need to. As it is, the print @lines may be copying memory, so iterating the print while reading the second for-loop might be a better idea. But if you need to "store it" in a variable in perl, then you may not be able to get around it.

Update:

You could do it in one loop with a "continue if $. < $start" but you need to make sure to reset "$." manually on eof() if you're iterating over or <>.

eruciform 2010-07-10 19:53:16

3-arg for loops are almost never necessary in Perl. They obscure what's really going on and promote off-by-one errors. Use the range op, simple ranges are optimized and don't generate a list. `$start_after = $start - 1; for (1..$start_after)` and `$lines = $end-$start+1; for(1..$lines)`

Schwern 2010-07-10 22:03:11

@schwern: doesn't the range operator iterate and create each of those numbers in turn? if it's a large range, it can eat up memory. i've always avoided it for unknown ranges...

eruciform 2010-07-10 22:47:00

you know, people, it's polite to put why you think this is such a negative answer. it's correct, it doesn't have some flaws in the other answers, and it shows the whole lifecycle of the example. this negative-bandwagon effect really annoys me on this site sometimes.

eruciform 2010-07-10 23:43:02

@erucifrom Repeat: simple ranges are optimized and do not generate a list. `for ($start..$end)` will create an iterator rather than a list. Anything more complicated and it will generate a list. Perl programmers really dislike 3-arg for loops, and its probably better done with `$.` than two `for`s.

Schwern 2010-07-11 01:23:49

@schwern: thanks, i didn't know about that. i've never heard about iterators in perl, they seem to be a lot data type used in many other languages but not there. if that's the case, then a simple range is better. what happens with the difference between "for my $foo ( 1 .. 2 )" vs "for $foo ( 1 .. 2 )" then? will it create references to the internal values of the range-interators values?

eruciform 2010-07-11 01:59:08

@eruciform Perl doesn't really have iterators, this is just a performance hack. Its documented in perlop. http://perldoc.perl.org/perlop.html#Range-Operators Both your examples probably copy the value like normal, otherwise you'd get an error when modifying $foo.

Schwern 2010-07-11 09:59:45

Answer 2

+1 A:

The following will load all desired lines of a file into an array variable. It will stop reading the input file as soon as the end line number is reached:

use strict; 
use warnings;

my $start = 3;
my $end   = 6;
my @lines;
while (<>) {
    last if $. > $end;
    push @lines, $_ if $. >= $start;
}

toolic 2010-07-10 19:59:47

this doesn't work with multiple files, that's why i warned against using it for <>... try it with a file of 2 lines and a file of 10 lines...

eruciform 2010-07-10 20:07:23

Here is a quote from the Question: "from a file". The Question only requires a single file, not multiple files. Since it works for a single file, it answers the question.

toolic 2010-07-10 20:12:14

If the number of lines is very large, you might get some performance by preallocating `@lines` so it doesn't have to grow. `$#lines = $end - $start + 1`. Might have to be very large to beat Perl's allocator.

Schwern 2010-07-10 22:04:35

Answer 3

A:

Reading line by line isn't going to be optimal. Fortunately someone has done the hardwork already :) use Tie::File; it present the file as an array. http://perldoc.perl.org/Tie/File.html

neal aise 2010-07-10 20:17:36

Normally I'd say to go with the module, but Tie::File is orders of magnitude slower over a large file than a simple loop. Even eliminating the overhead of `tie`. See for yourself. http://gist.github.com/471071

Schwern 2010-07-10 22:17:41

don't have a system to test this. but i would tune various parameters in the module and test (see the link). and is cmpthese appropriate way to compare? (this is io bound. i would test this one method at a time and that too on a freshly booted system)

neal aise 2010-07-10 23:10:41

@techie "orders of magnitude slower" means at least 100x slower (it is actually 50 to 100x slower). This is far beyond a benchmarking glitch.

Schwern 2010-07-11 01:15:54

Answer 4

+2 A:

You can use flip-flop operators

while(<>) {
if (($. == 3) .. ($. == 7)) {
    push @result, $_;
}

gonzo 2010-07-10 20:18:10

Answer 5

+6 A:

Use the range operator .. (also known as the flip-flop operator), which offers the following syntactic sugar:

If either operand of scalar .. is a constant expression, that operand is considered true if it is equal (==) to the current input line number (the $. variable).

If you plan to do this for multiple files via <>, be sure to close the implicit ARGV filehandle as described in the perlfunc documentation for the eof operator. (This resets the line count in $..)

The program below collects in the variable $lines lines 3 through 5 of all files named on the command line and prints them at the end.

#! /usr/bin/perl

use warnings;
use strict;

my $lines;
while (<>) {
  $lines .= $_ if 3 .. 5;
}
continue {
  close ARGV if eof;
}

print $lines;

Sample run:

$ ./prog.pl prog.pl prog.c main.hs
use warnings;
use strict;


int main(void)
{
import Data.Function (on)
import Data.List (sortBy)
--import Data.Ord (comparing)

Greg Bacon 2010-07-10 21:31:13

ansaurus

tags:

views:

answers:

Perl - the fastest way to read a range of lines from a file into a variable

related questions