views:

59

answers:

3

I know the file structure, suppose this structure is this:

[3-bytes long int],[1-byte long unsigned integer],[4-bytes long unsigned integer]

So the file contains chains of such records.

What is the most elegent way to parse such a file in Java?

Supposedly, we can define a byte[] array of overall length and read it with InputStream, but how then convert its subelements into correct integer values?

First thing, byte value in java is signed, we need unsigned value in our case. Next thing, are there useful methods that allow to convert a sub-array of bytes, say, bytes from 1-st to 4-th into a correct integer value?

I know for sure, there are functions pack & unpack in Perl, that allow you to represent a string of bytes as an expression, let's say "VV" means 2 unsigned long int values. You define such a string and provide it as an argument to a pack or unpack functions, along with the bytes to be packed/unpacked. Are there such things in Java / Apache libs etc ?

+1  A: 

You may take a look at this sample BinaryReader class which is based on the DataInputStream class.

Darin Dimitrov
+1  A: 

You should be able to do this using a DataInputStream. It's been a while since I've done much development like this, but the trick I seem to remember is that if there's an impedance mis-match between your input format and the language's data types you'll need to construct the data byte by byte. In this case, it looks like you'll need to do that because the data structure has oddly sized structures.

To give you an example to read the first record you might need to do something like this (I'm using a, b, and c for the attributes of the record)

DataInputStream dis = ...

int a = 0;
a = dis.readByte();
a = a << 8;         
a = a | dis.readByte();
a = a << 8;
a = a | dis.readByte();

short b = 0;
b = dis.readByte();

long c = 0;
c = dis.readByte();
c = c << 8;
c = c | dis.readByte();
c = c << 8;
c = c | dis.readByte();
c = c << 8;
c = c | dis.readByte();

Obviously, this code could be tightened up by compounding some of the statements, but you get the general idea. What you might notice is that for each of the attributes being read I have to use a primitive that's larger than needed so there aren't any overflow errors. For reference, in Java:

  • byte = 1 byte
  • short = 16 bit, 2 bytes
  • int = 32 bits, 4 bytes
  • long = 64 bits, 8 bytes
Bryan Kyle
+1  A: 

Like @Bryan Kyle example but shorter. I like shorter, but that doesn't mean clearer, you decide. ;) Note: readByte() is signed and will have unexpected results if not masked with 0xFF.

DataInputStream dis = ... 

// assuming BIG_ENDIAN format
int a = dis.read() << 16 | dis.read() << 8 | dis.read(); 
short b = (short) dis.read(); 
long c = dis.readInt() & 0xFFFFFFFFL; 

or

ByteBuffer bb = 
bb.position(a_random_postion);
int a = (bb.get() & 0xFF) << 16 | (bb.get() & 0xFF) << 8 | (bb.get() & 0xFF); 
short b = (short) (bb.get() & 0xFF); 
long c = bb.readInt() & 0xFFFFFFFFL; 
Peter Lawrey