views:

360

answers:

5

In Perl, is it appropriate to use a string as a byte array containing 8-bit data? All the documentation I can find on this subject focuses on 7-bit strings.

For instance, if I read some data from a binary file into $data

my $data;

open FILE, "<", $filepath;
binmode FILE;
read FILE $data 1024;

and I want to get the first byte out, is substr($data,1,1) appropriate? (again, assuming it is 8-bit data)

I come from a mostly C background, and I am used to passing a char pointer to a read() function. My problem might be that I don't understand what the underlying representation of a string is in Perl.

+1  A: 

You probably want to use sysopen and sysread if you want to read bytes from binary file.

See also perlopentut.

Whether this is appropriate or necessary depends on what exactly you are trying to do.

#!/usr/bin/perl -l

use strict; use warnings;
use autodie;

use Fcntl;

sysopen my $bin, 'test.png', O_RDONLY;
sysread $bin, my $header, 4;

print map { sprintf '%02x', ord($_) } split //, $header;

Output:

C:\Temp> t
89504e47
Sinan Ünür
+4  A: 

The bundled documentation for the read command, reproduced here, provides a lot of information that is relevant to your question.

read FILEHANDLE,SCALAR,LENGTH,OFFSET

read FILEHANDLE,SCALAR,LENGTH

Attempts to read LENGTH characters of data into variable SCALAR from the specified FILEHANDLE. Returns the number of characters actually read, 0 at end of file, or undef if there was an error (in the latter case $! is also set). SCALAR will be grown or shrunk so that the last character actually read is the last character of the scalar after the read.

An OFFSET may be specified to place the read data at some place in the string other than the beginning. A negative OFFSET specifies placement at that many characters counting backwards from the end of the string. A positive OFFSET greater than the length of SCALAR results in the string being padded to the required size with "\0" bytes before the result of the read is appended.

The call is actually implemented in terms of either Perl's or system's fread() call. To get a true read(2) system call, see "sysread".

Note the characters: depending on the status of the filehandle, either (8-bit) bytes or characters are read. By default all filehandles operate on bytes, but for example if the filehandle has been opened with the ":utf8" I/O layer (see "open", and the "open" pragma, open), the I/O will operate on UTF-8 encoded Unicode characters, not bytes. Similarly for the ":encoding" pragma: in that case pretty much any characters can be read.

mobrule
my nature being very pedantic, when I read this in the documentation I found `character` ambiguous. I was unclear if it means a unit of data (ie, one byte) or a unit of a string (dependent on encoding)
Mike
Calling `binmode FILE, ":raw"` or `binmod FILE, ":bytes"` will always open your filehandle in "bytes" mode, regardless of your default IO layer (say, if you declared `use utf8`).
mobrule
+2  A: 

See perldoc -f pack and perldoc -f unpack for how to treat strings as byte arrays.

Ether
+1  A: 

Whatever you do, please don't use bareword filehandles.

Andy Lester
A: 

It might help more if you tell us what you are trying to do with the byte array. There are various ways to work with binary data, and each lends itself to a different set of tools.

Do you want to convert the data into a Perl array? If so, pack and unpack are a good start. split could also come in handy.

Do you want to access individual elements of the string without unpacking it? If so, substr is fast and will do the trick for 8 byte data. If you want other bit depths, take a look at the vec function, which treads a string as a bit vector.

Do you want to scan the string and convert certain bytes to other bytes? Then the s/// or tr/// constructs might be useful.

Eric Strom