views:

107

answers:

2

I've slurped in a big file using File::Slurp but given the size of the file I can see that I must have it in memory twice or perhaps it's getting inflated by being turned into 16 bit unicode. How can I best diagnose that sort of a problem in Perl?

The file I pulled in is 800mb in size and my perl process that's analysing that data has roughly 1.6gb allocated at runtime.

I realise that I may be wrong about my reason for the problem but I'm not sure the most efficient way to prove/disprove my theory.

Update:

I have elminated dodgy character encoding from the list of suspects. It looks like I'm copying the variable at some point, I just can't figure out where.

Update 2:

I have now done some more investigation and discovered that it's actually just getting the data from File::Slurp that's causing the problem. I had a look through the documentation and discovered that I can get it to return a scalar_ref, i.e.

my $data = read_file($file, binmode => ':raw', scalar_ref => 1);

Then I don't get the inflation of my memory. Which makes some sense and is the most logical thing to do when getting the data in my situation.

The information about looking at what variables exist etc. has generally helpful though thanks.

+4  A: 

Maybe Devel::DumpSizes and/or Devel::Size can help out? I think the former would be more useful in your case.

Devel::DumpSizes - Dump the name and size in bytes (in increasing order) of variables that are available at a give point in a script.

Devel::Size - Perl extension for finding the memory usage of Perl variables

Htbaa
+4  A: 

Here are some generic resources on memory issues in Perl:

As far as your own suggestion, the simplest way to disprove would be to write a simple Perl program that:

  1. Creates a big (100M) file of plain text, probably by just outputting the same string in a loop into a file, or for binary files running dd command via system() call

  2. Read the file in using standard Perl open()/@a=<>;

  3. Measure memory consumption.

Then repeat #2-#3 for your 800M file.

That will tell you if the issue is File::Slurp, some weird logic in your program, or some specific content in the file (e.g. non-ascii, although I'd be surprized if that ends up to be the reason)

DVK
I do appear to have eliminated dodgy character encoding. A closer look reveals the process starts out with roughly the same memory footprint as the file then after doing some stuff checking things in the header it doubles up. I just can't see what's causing that.
Colin Newell