Hi, I have a text file that's composed of fixed length records but all in one line with no line breaks in between. What's the best way to process it in Perl? Thanks!
views:
173answers:
3
+2
A:
unpack() may be of use here. You can specify the list of characters (using 'c', 'C' or 'W') and it'll unpack automatically into a list. See the pack documentation for the options to use.
Brian Agnew
2009-08-06 14:10:09
+5
A:
use the read FILEHANDLE,SCALAR,LENGTH
function to read a block at a time into a buffer...
use constant LEN => 60;
while (!eof $fh) {
my $len = read $fh, $buf, LEN;
die "short read" if $len < LEN;
# processing...
}
... and process the buffer using regular expressions, unpack
, or however you like.
hillu
2009-08-06 14:24:23
+8
A:
First, let's open the file, and make sure it's in bin mode:
open my $fh, '<', 'file.name' or die "Cannot open file.name: $!";
binmode $fh;
Now, set input record separator to reference to length of your records (let's assume 120 bytes per record):
local $/ = \120;
Now, let's read the records:
while (my $record = <$fh>) {
And now if you want to get data out of it, you have to write some unpack thing:
my @elements = unpack("......", $record);
Now you can process @elements, and finish while() {} loop:
...
}
Whole "program":
open my $fh, '<', 'file.name' or die "Cannot open file.name: $!";
binmode $fh;
local $/ = \120;
while (my $record = <$fh>) {
my @elements = unpack("......", $record);
...
}
close $fh;
depesz
2009-08-06 14:42:00
+1 although I think `sysread` is more transparent.
Sinan Ünür
2009-08-06 17:23:32
why transparent? Perhaps you mean that $/ = \number is less known. That's true. But on the other hand it is very handy as you use the filehandle just like always.
depesz
2009-08-06 17:41:17
+1 nice! I wasn't aware that one could use `$/` this way.
hillu
2009-08-06 20:28:53
sysread is more transparent because you know you are not reading a line but a fixed number of bytes. When you aren't processing lines, acting like you are makes the problem harder. A lot of binary formats don't have a consistent byte-length for the objects through the format, so you often read different number of bytes for each bit.
brian d foy
2009-08-07 03:30:40
@brian d foy: sure. and if we would be dealing with variable-length records, I would write it in another way. but since this is clearly fixed-length, usage of $/ and standard <> seems easier. at least for me.
depesz
2009-08-07 06:38:33