views:

173

answers:

3

Hi, I have a text file that's composed of fixed length records but all in one line with no line breaks in between. What's the best way to process it in Perl? Thanks!

+2  A: 

unpack() may be of use here. You can specify the list of characters (using 'c', 'C' or 'W') and it'll unpack automatically into a list. See the pack documentation for the options to use.

Brian Agnew
+5  A: 

use the read FILEHANDLE,SCALAR,LENGTH function to read a block at a time into a buffer...

use constant LEN => 60;
while (!eof $fh) {
    my $len = read $fh, $buf, LEN;
    die "short read" if $len < LEN;
    # processing...
}

... and process the buffer using regular expressions, unpack, or however you like.

hillu
+8  A: 

First, let's open the file, and make sure it's in bin mode:

open my $fh, '<', 'file.name' or die "Cannot open file.name: $!";
binmode $fh;

Now, set input record separator to reference to length of your records (let's assume 120 bytes per record):

local $/ = \120;

Now, let's read the records:

while (my $record = <$fh>) {

And now if you want to get data out of it, you have to write some unpack thing:

  my @elements = unpack("......", $record);

Now you can process @elements, and finish while() {} loop:

  ...
}

Whole "program":

open my $fh, '<', 'file.name' or die "Cannot open file.name: $!";
binmode $fh;
local $/ = \120;
while (my $record = <$fh>) {
  my @elements = unpack("......", $record);
  ...
}
close $fh;
depesz
+1 although I think `sysread` is more transparent.
Sinan Ünür
why transparent? Perhaps you mean that $/ = \number is less known. That's true. But on the other hand it is very handy as you use the filehandle just like always.
depesz
+1 nice! I wasn't aware that one could use `$/` this way.
hillu
sysread is more transparent because you know you are not reading a line but a fixed number of bytes. When you aren't processing lines, acting like you are makes the problem harder. A lot of binary formats don't have a consistent byte-length for the objects through the format, so you often read different number of bytes for each bit.
brian d foy
@brian d foy: sure. and if we would be dealing with variable-length records, I would write it in another way. but since this is clearly fixed-length, usage of $/ and standard <> seems easier. at least for me.
depesz