views:

2014

answers:

5

What's the best way to read a fixed length record in Perl. I know to read a file like:

ABCDE 302
DEFGC 876

I can do

while (<FILE>) {
   $key = substr($_, 0, 5);
   $value = substr($_, 7, 3);
}

but isn't there a way to do this with read/unpack?

+16  A: 
my($key, $value) = unpack "A5 A3";    # Original, but slightly dubious

We both need to check out the options at the unpack manual page (and, more particularly, the pack manual page).

Since the A pack operator removes trailing blanks, your example can be encoded as:

my($key, $value) = unpack "A6A3";

Alternatively (this is Perl, so TMTOWTDI):

my($key, $blank, $value) = unpack "A5A1A3";

The 1 is optional but systematic and symmetric. One advantage of this is that you can validate that $blank eq " ".

Jonathan Leffler
A: 

Regardless of whether your records and fields are fixed-length, if the fields are separated by uniform delimiters (such as a space or comma), you can use the split function more easily than unpack.

my ($field1, $field2) = split / /;

Look up the documentation for split. There are useful variations on the argument list and on the format of the delimiter pattern.

Barry Brown
If any field values are less than the fixed width (although this isn't the case in his example), the string will be split for the trailing spaces as well, which is wrong. If the field value lengths are all identical, then you are correct, there is no difference between delimited and fixed-width
Adam Bellaire
It's not a matter of field length. If fields can have significant whitespace, you can't split on whitespace. That's one of the points of fixed-length fields. :)
brian d foy
+3  A: 

Assume 10 character records of two five character fields per record:

open(my $fh, "<", $filename) or die $!;
while(read($fh, $buf, 10)) {
  ($field1, $field2) = unpack("A5 A5", $buf);
  # ... do something with data ...
}
Michael Cramer
+7  A: 

Update: For the definitive answer, see Jonathan Leffler's answer below.

I wouldn't use this for just two fields (I'd use pack/unpack directly), but for 20 or 50 or so fields I like to use Parse::FixedLength (but I'm biased). E.g. (for your example) (Update: also, you can use $/ and <> as an alternative to read($fh, $buf, $buf_length)...see below):

use Parse::FixedLength;

my $pfl = Parse::FixedLength->new([qw(
  key:5
  blank:1
  value:3
)]);
# Assuming trailing newline
# (or add newline to format above and remove "+ 1" below)
my $data_length = $pfl->length() + 1;

{
  local $/ = \$data_length;
  while(<FILE>) {
    my $data = $pfl->parse($_);
    print "$data->{key}:$data->{value}\n";
    # or
    print $data->key(), ":", $data->value(), "\n";
  }
}

There are some similar modules that make pack/unpack more "friendly" (See the "See Also" section of Parse::FixedLength).

Update: Wow, this was meant to be an alternative answer, not the official answer...well, since it is what it is, I should include some of Jonathan Leffler's more straight forward code, which is likely how you should usually do it (see pack/unpack docs and Jonathan Leffler's node below):

$_ = "ABCDE 302";
my($key, $blank, $value) = unpack "A5A1A3";
runrig
A: 

"What's the best way to read a fixed length record in Perl"

Is it truly profoundly fixed-length? Do the rules change? Are there other constraints or anticonstraints?

Perhaps the best way is to master regular expressions. By "best" I mean it is the "code that most follows the way humans think"

So you have something that looks like this .. $stuff = "cat dog"

For now assume YOU ABSOLUTELY KNOW it is three letters, a space, and three letters.

So it's just this:

$stuff =~ /([a-z]{3}) ([a-z]{3})/;
$first_word_found = $1;
$second_word_found = $2;

nothing could be more natural. Obverve that you have total control over the format of the stuff.

In the example, I said "it must be lower case" ... hence "[a-z]" But of course, you could "specify" it any way you want, anything that can possibly be decribed you can do in a regular expression.

If you know that the gap in the middle might be perhaps more than one space, or other types of whitespace or something else, you can easily account for that in the regex.

The regex defines the laws of the universe, for, how the stuff in question "should be" ("according to you")

Alternately if you want less and less rules, so that the format of "stuff" is more and more flexible, a regex is exactly how you "state" that nature.

I have found in the very real world, this type of thing is almost always necessary. Because inevitably you are checking that something obeys some set of rules that have been described in the human world, you're changing those rules over time, you're adding new fields (or whatever the case is) and so on.

It might work for you!

Joe Blow