views:

744

answers:

5

Thanks to everyone in advance.

I'd like to access the nth byte of a binary scalar. For example you could get all the file data in one scalar variable...

Imagine that the binary data is collected into scalar...

open(SOURCE, "<", "wl.jpg"); 
my $thisByteData = undef; 
while(<SOURCE>){$thisByteData .= $_;} 
close SOURCE;

$thisByteData is raw binary data. When I use length($thisByteData) I get the byte count back, so Perl does know how big it is. My question is how can I access the Nth byte?

Side note: My function is going to receive this binary scalar, its in my function that I want to access the Nth byte. The help regarding how to collect this data is appreciated but not what I'm looking for. Whichever way the other programmer wants to collect the binary data is up to them, my job is to get the Nth byte when its passed to me :)

Again thanks so much for the help to all!


Thanks to @muteW who has gotten me further than ever. I guess I'm not understanding unpack(...) correctly.

print(unpack("N1", $thisByteData));
print(unpack("x N1", $thisByteData));
print(unpack("x0 N1", $thisByteData));

Is returning the following:

4292411360
3640647680
4292411360

I would assume those 3 lines would all access the same (first) byte. Using no "x" just an "x" and "x$pos" is giving unexpected results.

I also tried this...

print(unpack("x0 N1", $thisByteData));
print(unpack("x1 N1", $thisByteData));
print(unpack("x2 N1", $thisByteData));

Which returns... the same thing as the last test...

4292411360
3640647680
4292411360

I'm definatly missing something about how unpack works.


If I do this...

print(oct("0x". unpack("x0 H2", $thisByteData)));
print(oct("0x". unpack("x1 H2", $thisByteData)));
print(oct("0x". unpack("x2 H2", $thisByteData)));

I get what I was expecting...

255
216
255

Can't unpack give this to me itself without having to use oct()?


As a side note: I think I'm getting the 2's compliment of these byte integers when using "x$pos N1". I'm expecting these as the first 3 bytes.

255
216
255

Thanks again for the help to all.


Special thanks to @brian d foy and @muteW ... I now know how to access the Nth byte of my binary scalar using unpack(...). I have a new problem to solve now, which isn't related to this question. Again thanks for all the help guys!

This gave me the desired result...

print(unpack("x0 C1", $thisByteData));
print(unpack("x1 C1", $thisByteData));
print(unpack("x2 C1", $thisByteData));

unpack(...) has a ton of options so I recommend that anyone else who reads this read the pack/unpack documentation to get the byte data result of their choice. I also didn't try using the Tie options @brian mentioned, I wanted to keep the code as simple as possible.

+2  A: 

The Perl built-in variable $/ (or $INPUT_RECORD_SEPARATOR in if you're useing English) controls Perl's idea of a "line". By default it is set to "\n", so lines are separated by newline characters (duh), but you can change this to any other string. Or change it to a reference to a number:

$/ = \1;
while(<FILE>) {
  # read file
}

Setting it to a reference to a number will tell Perl that a "line" is that number of bytes.

Now, what exactly are you trying to do? There's probably a number of modules that will do what you're trying to do, and possibly more efficiently. If you're just trying to learn how to do it, go ahead, but if you have a specific task in mind, consider not reinventing the wheel (unless you want to).

EDIT: Thanks to jrockway in the comments...

If you have Unicode data, this may not read one byte, but one character, but if this happens, you should be able to use bytes; to turn off automatic byte-to-character translation.

Now, you say you want to read the data all at once and then pass it to a function. Let's do this:

my $data;
{
  local $/;
  $data = <FILE>;
}

Or this:

my $data = join("", <FILE>);

Or some will suggest the File::Slurp module, but I think it's a bit overkill. However, let's get an entire file into an array of bytes:

use bytes;

...

my @data = split(//, join("", <FILE>));

And then we have an array of bytes that we can pass to a function. Like?

Chris Lutz
Does this assumption hold true when perlio layers decode bytes to characters?
jrockway
I'd like to slurp the file up like I did and pass all that byte data to another function. Inside that function I want to get access to each byte one by one.
rakhavan
@jrockway - Probably not, but you can `use bytes;` to turn that off. Good point.
Chris Lutz
Instead of inserting an "EDIT:" line, how about fixing the entire answer? You don't need to show every version of your answer in the finished version. And, don't forget about binmode. :)
brian d foy
@Chris Lutz: Don't use bytes for that, use binmode. There are really no good use cases for use bytes.
ysth
Actually if your trying to get an array of bytes `use bytes; $/ = \1; @array = <$file_handle>;` would work better than `split m'', ...`
Brad Gilbert
@Brad Gilbert: Don't use bytes for that, use binmode. There are really no good use cases for use bytes.
ysth
+1  A: 

Without knowing much more about what you're trying to do with your data, something like this will iterate over the bytes in the file:

open(SOURCE, "wl.jpg");
my $byte;
while(read SOURCE, $byte, 1) {
    # Do something with the contents of $byte
}
close SOURCE;

Be careful with the concatanation used in your example; you may end up with newline conversions, which is definitely not what you want to happen while reading binary files. (It's also inefficient to continually expand the scalar while reading it.) This is the idiomatic way to schlep an entire file into a Perl scalar:

open(SOURCE, "<", "wl.jpg");
local $/ = undef;
my $big_binary_data = <SOURCE>;
close SOURCE;
Commodore Jaeger
Great point about the extra spaces, my code is just an example of getting all the bytes in one variable so I can pass them all at once... Now if I could use the while(read... like you have above and create an array full of my bytes, but I was wondering if perl has a way to access byte data like an array.
rakhavan
Add a binmode to that and you should be good to go.
brian d foy
+3  A: 

I think the correct answer involves pack/unpack, but this might also work:

use bytes;
while( $bytestring =~ /(.)/g ){
   my $byte = $1;
   ...
}

"use bytes" ensures that you never see characters -- but if you have a character string and are processing it as bytes, you are doing something wrong. Perl's internal character encoding is undefined, so the data you see in the string under "use bytes" is nearly meaningless.

jrockway
+8  A: 

If you have the data in a string and you want to get to a certain byte, use substr, as long as you are treating the string as bytes to start with.

However, you can read it directly from the file without all this string nonsense people have been filling your head with. :) Open the file with sysopen and the right options, use seek to put yourself where you want, and read what you need with sysread.

You skip all the workarounds for the stuff that open and readline are trying to do for you. If you're just going to turn off all of their features, don't even use them.

brian d foy
@brain, since I tried and can't access the scalar like an array... I was going to use substr, then I told myself, hey this isn't a "string". The other half of the coin here is that I need to process this data as fast as possible... is substr the fastest way to get to the Nth character?
rakhavan
@brian, also is this the ONLY way to access the Nth byte of the scalar?
rakhavan
If it were me, I'd do what I recommended. As for only, I bet I can come up with at least five completely different ways to do it. I'd still only move around the file with seek as I said.
brian d foy
There is a reason for the acronym: http://en.wikipedia.org/wiki/TMTOWTDI
Brad Gilbert
You can access the scalar as an array through the Tie mechanism. I showed an example of this in Mastering Perl when I talk about a long DNA string.
brian d foy
Look at my updated response. To get unsigned char output instead of Hex simply substitute 'H' with 'C'.
muteW
+2  A: 

Since you already have the file contents in $thisByteData you could use pack/unpack to access the n-th byte.

sub getNthByte {
  my ($pos) = @_;
  return unpack("x$pos b1", $thisByteData);
}

#x$pos - treats $pos bytes as null bytes(effectively skipping over them) 
#b1    - returns the next byte as a bit string

Read through the pack documentation to get a sense of the parameters you can use in the template to get different return values.

EDIT - Your comment below shows that you are missing the high-order nybble ('f') of the first byte. I am not sure why this is happening but here is an alternative method that works, in the meantime I'll have a further look into unpack's behavior.

sub getNthByte {
  my ($pos) = @_;
  return unpack("x[$pos]H2", $binData);
}

(my $hex = unpack("H*", $binData)) =~ s/(..)/$1 /g;
#To convert the entire data in one go

Using this the output for the first four bytes are - 0xff 0xd8 0xff 0xe0 which matches the documentation.

muteW
This has gotten me very far! My only problem right now is that it seem to be skipping bytes in the beginning of my input.
rakhavan
What do you get in position 0/1/2? How many bytes is it skipping and if possible could you post the hex values of the skipped bytes here.
muteW
That's funny! I'm missing the first 3 bytes. unpack("x0 h1", $thisByteData) = f unpack("x1 h1", $thisByteData) = 8 unpack("x2 h1", $thisByteData) = f unpack("x0 H1", $thisByteData) = f unpack("x1 H1", $thisByteData) = d unpack("x2 H1", $thisByteData) = f Is this the output you were asking for?
rakhavan
No you aren't. According to this http://www.onicos.com/staff/iz/formats/jpeg.html the first three bytes are 0xff 0xd8 0xff in big-endian format. Check your output - you are getting the same output split into nybbles (due to the usage of h and H).
muteW
I also "seem" to be missing the first 3 bytes when I pass a string to this function... do you think this is by chance?
rakhavan
It would be a lot more helpful if you could detail the current code/output in the OP so that more people can have a look at it.
muteW