views:

129

answers:

3

In Perl, pack and unpack have two templates for converting bytes to/from hex:

h    A hex string (low nybble first).
H    A hex string (high nybble first).

This is best clarified with an example:

use 5.010; # so I can use say
my $buf = "\x12\x34\x56\x78";

say unpack('H*', $buf); # prints 12345678
say unpack('h*', $buf); # prints 21436587

As you can see, H is what people generally mean when they think about converting bytes to/from hexadecimal. So what's the purpose of h? Larry must have thought someone might use it, or he wouldn't have bothered to include it.

Can you give a real-world example where you'd actually want to use h instead of H with pack or unpack? I'm looking for a specific example; if you know of a machine that organized its bytes like that, what was it, and can you link to some documentation on it?

I can think of examples where you could use h, such as serializing some data when you don't really care what the format is, as long as you can read it back, but H would be just as useful for that. I'm looking for an example where h is more useful than H.

+2  A: 

I imagine this being useful when transfering data to or reading data from a machine with different endianess. If some process expects to receive data the way it would normally represent it in memory, then you better send your data just that way.

rafl
I don't think endianess comes into it, because this is nybbles _within_ a byte. The bytes are still processed in the same order.
cjm
mixed- or middle-endianess also happens to exist.
rafl
Can you give a specific example of such a machine?
cjm
According to Wikipedia, the PDP-11 is an example. None one would care about these days, really, but still. There's apparently also ways to end up with data like that on some ARM machines. Also you seem to make the assumption that the data you're working with is always byte-aligned.
rafl
@cjm: When people say "endianness" they're usually talking about byte order but in reality the topic is broader and can include nybble and even bit ordering as well. Ultimately the internal representation is hardware-dependent and can be whatever crazy scheme the designer came up with.
Michael Carman
A: 

The distinction between the two just has to do with whether you are working with big-endian or little-endian data. Sometimes you have no control over the source or destination of your data, so the H and h flags to pack are there to give you the option. V and N are there for the same reason.

Eric Strom
I don't think endianess comes into it, because this is nybbles _within_ a byte. The bytes are still processed in the same order.
cjm
as mentioned by `rafl`, these are there for the once in a blue moon edge cases where you have to deal with "funny" data, think legacy systems and esoteric poorly documented binary file formats
Eric Strom
+7  A: 

Recall in the bad 'ole days of MS-DOS that certain OS functions were controlled by setting high nibble and low nibbles on a register and performing an Interupt xx. For example, Int 21 accessed many file functions. You would set the high nibble as the drive number -- who will have more than 15 drives?? The low nibble as the requested function on that drive, etc.

Here is some old CPAN code that uses pack as you describe to set the registers to perform an MS-DOS system call.

Blech!!! I don't miss MS-DOS at all...

--Edit

Here is specific source code: Download Perl 5.00402 for DOS HERE, unzip,

In file Opcode.pm and Opcode.pl you see the use of unpack("h*",$_[0]); here:

sub opset_to_hex ($) {
    return "(invalid opset)" unless verify_opset($_[0]);
    unpack("h*",$_[0]);
}

I did not follow the code all the way through, but my suspicion is this is to recover info from an MS-DOS system call...

In perlport for Perl 5.8-8, you have these suggested tests for endianess of the target:

Different CPUs store integers and floating point numbers in different orders (called endianness) and widths (32-bit and 64-bit being the most common today). This affects your programs when they attempt to transfer numbers in binary format from one CPU architecture to another, usually either “live” via network connection, or by storing the numbers to secondary storage such as a disk file or tape.

Conflicting storage orders make utter mess out of the numbers. If a little-endian host (Intel, VAX) stores 0x12345678 (305419896 in decimal), a big-endian host (Motorola, Sparc, PA) reads it as 0x78563412 (2018915346 in decimal). Alpha and MIPS can be either: Digital/Compaq used/uses them in little-endian mode; SGI/Cray uses them in big-endian mode. To avoid this problem in network (socket) connections use the pack and unpack formats n and N, the “network” orders. These are guaranteed to be portable.

As of perl 5.8.5, you can also use the > and < modifiers to force big- or little-endian byte-order. This is useful if you want to store signed integers or 64-bit integers, for example.

You can explore the endianness of your platform by unpacking a data structure packed in native format such as:

   print unpack("h*", pack("s2", 1, 2)), "\n";
   # '10002000' on e.g. Intel x86 or Alpha 21064 in little-endian mode
   # '00100020' on e.g. Motorola 68040

If you need to distinguish between endian architectures you could use either of the variables set like so:

   $is_big_endian    = unpack("h*", pack("s", 1)) =~ /01/;
   $is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;

Differing widths can cause truncation even between platforms of equal endianness. The platform of shorter width loses the upper parts of the number. There is no good solution for this problem except to avoid transferring or storing raw binary numbers.

One can circumnavigate both these problems in two ways. Either transfer and store numbers always in text format, instead of raw binary, or else consider using modules like Data::Dumper (included in the standard distribution as of Perl 5.005) and Storable (included as of perl 5.8). Keeping all data as text significantly simplifies matters.

The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's how far EBCDIC, or more precisely UTF-EBCDIC will go.

It seems that unpack("h*",...) is used more often than pack("h*",...). I did note that return qq'unpack("F", pack("h*", "$hex"))'; is used in Deparse.pm and IO-Compress uses pack("*h",...) in Perl 5.12

If you want further examples, here is a Google Code Search list. You can see pack|unpack("h*"...) is fairly rare and mostly to do with determining platform endianess...

drewk
That code uses `pack`, but it doesn't use either `h` or `H` with it, only `s` and `c`.
cjm