views:

1659

answers:

5

Hey,

I'm trying to read unsigned integers from a file (stored as consecutive byte) and convert them to Integers. I've tried this:

file = File.new(filename,"r")
num = file.read(2).unpack("S") #read an unsigned short
puts num #value  will be less than expected

What am I doing wrong here?

Cheers,

Pete

A: 

What format are the numbers stored in the file? Is it in hex? Your code looks correct to me.

According to the VM spec: "Multibyte data items are always stored in big-endian order, where the high bytes come first."
Peter
+1  A: 

When dealing with binary data you need to be sure you're opening the file in binary mode if you're on Windows. This goes for both reading and writing.

open(filename, "rb") do |file|
  num = file.read(2).unpack("S")
  puts num
end

There may also be issues with "endian" encoding depending on the source platform. For instance, PowerPC-based machines, which include old Mac systems, IBM Power servers, PS3 clusters, or Sun Sparc servers.

Can you post an example of how it's "less"? Usually there's an obvious pattern to the data.

For example, if you want 0x1234 but you get 0x3412 it's an endian problem.

tadman
I'm trying to read the magic number of a Java .class file. My code produces 202 as the magic number, whereas it should be 3405691582 (0xCAFEBABE). That didn't change when I used "rb".
Peter
Also, I'm on Linux, do I need to worry about opening the file in binary mode still?
Peter
It's still good form to explicitly use binary mode on Unix. It doesn't hurt (it's just a no-op), but a.) it makes your code clearer and b.) saves you tons of debugging, if someone ever runs your code on Windows.
Jörg W Mittag
+3  A: 

You're not reading enough bytes. As you say in the comment to tadman's answer, you get '202' instead of '3405691582'

Notice that the first 2 bytes of 0xCAFEBABE is 0xCA = 202

If you really want all 8 bytes in a single number, then you need to read more than the unsigned short

try

num = file.read(8).unpack("L_")

The underscore is assuming that the native long is going to be 8 bytes, which definitely is not guaranteed.

bobDevil
I tried it, and I get 3199925962 instead (which still isn't right!). Also, is there a cross-platform way of implementing this?
Peter
first byte is 0xCA, first two is 0xCAFE
rampion
`3199925962 = 0xBEBAFECA`, so it looks like you're having a byte order problem. For crossplatformness, I usually rely on network byte order, rather than host byte order.
rampion
0xCA = two 4-bit nybbles = 1 8-bit byte, as rampion points out. Getting 202 is the same as file.read[0] though, the ASCII value of the first character, which might be the problem.
tadman
Oops. Yes, I confused myself when counting the first two hex characters vs. two bytes. My general concept was correct however, now you just need to deal with byte order.
bobDevil
A: 

There are a couple of libraries that help with parsing binary data in Ruby, by letting you declare the data format in a simple high-level declarative DSL and then figure out all the packing, unpacking, bit-twiddling, shifting and endian-conversions by themselves.

I have never used one of these, but here's two examples. (There are more, but I don't know them):

Jörg W Mittag
A: 

Ok, I got it to work:

num = file.read(8).unpack("N")

Thanks for all of your help.

Peter