tags:

views:

317

answers:

3

Hi I have a 14Mb file with a matrix, in raw binary format. I would like to slurp it and have something like an array of arrays, so I can read some values. I was hoping to find some magical perl module that would, given the size of the matrix, do all the work for me :) But I can't find it, and I suspect I'm just missing a more obvious way of doing it. PDL::IO::FlexRaw is close to what I need, although I'm a bit confused about the warning with strange characters added by F77.

Update: Thanks for the answer. Maybe I didn't explain myself well. The matrix is in a binary file, in raw format, in 64 bits floats. The first 8 bytes of the binary file is the first "cell" of the matrix, (1,1). The next 8 bytes are the second cell, (2,1). It has no header, no footer. I know it's dimensions, so I can tell the module "I have a row for every 64000 bytes".

I'm looking at tie::mmapArray, but I don't know if I can make it work. Maybe I better using lseek() back and forth to find the 8 bytes I need and then unpack() it?

Does anybody knows the best way of doing that?

TIA, -- Diego.

A: 
mirod
A: 

Without knowing the structure of your file, how could any library hope to read it? If it's some kind of standardized matrix binary format, then you could try searching CPAN for that. Otherwise, I'm guessing you'll have to do the work yourself.

Assuming it's not a sparse matrix, it's probably just a matter of reading in the dimensions, and then reading in appropriately sized blocks.

+1  A: 

Unless you're tight on memory, just read the whole file in.

$size = -s $filename;
open(DATA, $filename);
sysread DATA, $buffer, $size;
@floats = unpack("d*", $buffer);
$float2x1 = $floats[ 2 + (1-1)*$width ];

That should access (2,1). (I didn't test it, though...)

EDIT:

Ok, low memory version:

use Sys::Mmap;
new Sys::Mmap $buffer, -s $filename, $filename or die $!;
$float2x1 = unpack("d", substr($buffer,8*( (2-1) + (1-1)*$width ),8));

Just needs Sys::Mmap from CPAN.

Jay Kominek
Hi Thanks. That code makes perl use 1GB of memory. it's a bit too much, although it's a way of doing it. -- Diego
I've edited the answer to add a way of doing it which doesn't slurp up unnecessary memory.
Jay Kominek