tags:

views:

500

answers:

4

Lets say I have a binary file that is formatted like

[unsigned int(length of text)][text][unsigned int(length of text)][text][unsigned int(length of text)][text]

and that pattern for the file just keeps repeating. How in Perl do I read the unsigned int and print it out followed by the text block? Again, this is a binary file and not a plain text file.

+1  A: 

You'll need to use the unpack function on the data.

Check out: http://www.perlmonks.org/?node_id=224666

This should get you headed in the right direction: (assuming 32 bit)

#!/usr/bin/perl

use strict;

my $strBuf = "perl rocks";
my $packed = pack("I Z15", length($strBuf), $strBuf);
{
    open(my $binFile, '>', "test.bin") || die("Error opening file\n");
    binmode $binFile;
    print $binFile $packed;
    close $binFile;
}


open(my $binFile, '<', "test.bin") || die("Error opening file\n");
binmode $binFile;

my $buffer;
read($binFile, $buffer, 4);  ## Read out unsigned int binary data
my $length    = unpack("I", $buffer);  ## Unpack the data

read($binFile, $buffer, $length);  ## Read the length out as binary
my $string = unpack("Z$length", $buffer);   ## Unpack the string data in buffer

print "Len: $length  String: $string\n";
exit;
RC
Your code assumes an `unsigned int` in C is 4 bytes, which is not guaranteed to be the case (as I see you know). A better approach to avoid this mixup is to read in the entire file and _then_ process it, so that your code will work fine if it ever runs on a 16-bit platform where `unsigned int` is two bytes.
Chris Lutz
That's why I stated I was assuming 32 bit. I agree that reading into memory is a good and arguably a better solution, but we do not know the size of the file that is being processed or the memory available on the machine. Both solutions have pit-falls.
RC
Chris, how does reading the entire file into memory avoid using the wrong integer size?
Rob Kennedy
@chris,@rob, I would also like an explanation of that
A: 

In addition to using unpack, as RC points out, you will almost certainly want to use read or sysread to read data from the file.

daotoad
He has edited his answer.
Brad Gilbert
A: 

There is not really enough information here to solve this problem completely.

What is needed is the exact format of the length field and of the text field. Is the int 2 bytes, 4 bytes or 8 bytes? (All are possible.) Also is it little-endian or big-endian?

Given this information, you then access the first integer using the read function, and convert it to a number using bit operations or the unpack function.

The next issue is the exact format of the text string. Is it ASCII, EBCDIC or a UTF format? Knowing this you can calculate the length of the string and use one or more read operations to obtain the raw string which you may have to convert into a more manageable form.

One other thing -- you'll need to open the file in binary mode otherwise you may not obtain the results expected.

David Harris
I assume it's his platform's default `unsigned int` which has an `unpack` code so that you can rely on platform dependencies like this. And you could be lazy and just read in the whole file, then do the processing once you've read it.
Chris Lutz
+2  A: 

Here is a small working example.

#!/usr/bin/perl

use strict;
use warnings;

my $INT_SIZE = 2;
my $filename = 'somefile.bin';

open my $fh, '<', $filename or die "Couldn't open file $filename: $!\n";

binmode $fh;

while ( read $fh, my $packed_length, $INT_SIZE ) {

    my $text = '';
    my $length = unpack 'v', $packed_length;

    read $fh, $text, $length;

    print $length, "\t", $text, "\n";
}

Change INT_SIZE and the size and endianness of the unpack template to suit (either 'v' or 'n' or 'V' or 'N'). See the unpack manpage for more details.

jmcnamara