ansaurus

Question

Why do I use up so much memory when I read a file into memory in Perl?

Answer 1

+20 A:

You're reading all the content of the file into a @lines array. Of course that'll pull all the uncompressed content into memory. What you might have wanted instead is reading from your handle line-by-line, only keeping one line at a time in memory:

open my $foo, '<:gzip', 'file.gz' or die $!;
while (my $line = <$fh>) {
    # process $line here
}

rafl 2010-10-05 01:09:55

Answer 2

A:

With such big files I see only one solution: you can use command line to uncompress/compress file. Do your manipulation in Perl, then use again external tools to compress/decompress file :)

slav123 2010-10-05 01:13:37

It is possible to do all of that within Perl 5 without resorting to external tools. The problem is the reading of all of the data into memory at once, rather than processing it line by line.

Chas. Owens 2010-10-05 17:10:11

Answer 3

+14 A:

When you do:

my @lines = <FOO>;

you are creating an array with as many elements as there are lines in file. At 100 characters per line, that's about 3.4 million array entries. There is overhead associated with each array entry which means the memory footprint will be much larger than just the uncompressed size of the file.

You can avoid slurping and process the file line-by-line. Here is an example:

C:\Temp> dir file
2010/10/04  09:18 PM       328,000,000 file

C:\Temp> dir file.gz
2010/10/04  09:19 PM         1,112,975 file.gz

And, indeed,

#!/usr/bin/perl

use strict; use warnings;
use autodie;
use PerlIO::gzip;

open my $foo, '<:gzip', 'file.gz';

while ( my $line = <$foo> ) {
    print ".";
}

has no problems.

To get an idea of the memory overhead, note:

#!/usr/bin/perl

use strict; use warnings;
use Devel::Size qw( total_size );

my $x = 'x' x 100;
my @x = ('x' x 100);

printf "Scalar: %d\n", total_size( \$x );
printf "Array:  %d\n", total_size( \@x );

Output:

Scalar: 136
Array:  256

Sinan Ünür 2010-10-05 01:15:30

ansaurus

tags:

views:

answers:

Why do I use up so much memory when I read a file into memory in Perl?

related questions