tags:

views:

9885

answers:

11

I have to read a binary file in a legacy format with Java.

In a nutshell the file has a header consisting of several integers, bytes and fixed-length char arrays, followed by a list of records which also consist of integers and chars.

In any other language I would create structs (C/C++) or records (Pascal/Delphi) which are byte-by-byte representations of the header and the record. Then I'd read sizeof(header) bytes into a header variable and do the same for the records.

Something like this: (Delphi)

type
  THeader = record
    Version: Integer;
    Type: Byte;
    BeginOfData: Integer;
    ID: array[0..15] of Char;
  end;

...

procedure ReadData(S: TStream);
var
  Header: THeader;
begin
  S.ReadBuffer(Header, SizeOf(THeader));
  ...
end;

What is the best way to do something similar with Java? Do I have to read every single value on its own or is there any other way to do this kind of "block-read"?

+2  A: 

I guess FileInputStream lets you read in bytes. So, opening the file with FileInputStream and read in the sizeof(header). I am assuming that the header has a fixed format and size. I don't see that mentioned in the initial post, but assuming that is the case as it would get much more complex if the header has optional args and different sizes.

Once you have the info, there can be a header class in which you assign the contents of the buffer that you've already read. And then parse the records in a similar fashion.

Arvind
+7  A: 

You could use the DataInputStream class as follows:

DataInputStream in = new DataInputStream(new BufferedInputStream(
                         new FileInputStream("filename")));
int x = in.readInt();
double y = in.readDouble();

etc.

Once you get these values you can do with them as you please. Look up the java.io.DataInputStream class in the API for more info.

Vincent Ramdhanie
That's what I feared but since someone pointed out portability issues with the general approach of reading whole structs/records I think it's a good think it cannot be done in Java.
DR
A: 

In the past I used DataInputStream to read data of arbitrary types in a specified order. This will not allow you to easily account for big-endian/little-endian issues.

As of 1.4 the java.nio.Buffer family might be the way to go, but it seems that the your code might actually be more complicated. These classes do have support for handling endian issues.

Darron
+8  A: 

To my knowledge, Java forces you to read a file as bytes rather than being able to block read. If you were serializing Java objects, it'd be a different story.

The other examples shown use the DataInputStream class with a File, but you can also use a shortcut: The RandomAccessFile class:

RandomAccessFile in = new RandomAccessFile("filename", "r");
int version = in.readInt();
byte type = in.readByte();
int beginOfData = in.readInt();
byte[] tempId;
in.read(tempId, 0, 16);
String id = new String(tempId);

Note that you could turn the responce objects into a class, if that would make it easier.

R. Bemrose
This answer has not any votes (yet) but it contains all I wanted to now. Thank you.
DR
This also assumes that you wanted to turn the char array into a String, which may not be the case.
R. Bemrose
A: 

A while ago I found this article on using reflection and parsing to read binary data. In this case, the author is using reflection to read the java binary .class files. But if you are reading the data into a class file, it may be of some help.

Thomas Jones-Low
+5  A: 

I may have misunderstood you, but it seems to me you're creating in-memory structures you hope will be a byte-per-byte accurate representation of what you want to read from hard-disk, then copy the whole stuff onto memory and manipulate thence?

If that's indeed the case, you're playing a very dangerous game. At least in C, the standard doesn't enforce things like padding or aligning of members of a struct. Not to mention things like big/small endianness or parity bits... So even if your code happens to run it's very non-portable and risky - you depend on the compiler's creator not changing its mind on future versions.

Better to create an automaton to both validate the structure being read (byte per byte) from HD is valid, and filling an in-memory structure if it's indeed OK. You may loose some milliseconds (not so much as it may seem for modern OSes do a lot of disk read caching) though you gain platform and compiler independence. Plus, your code will be easily ported to another language.

Post Edit: In a way I sympathize with you. In the good-ol' days of DOS/Win3.11, I once created a C program to read BMP files. And used exactly the same technique. Everything was nice until I tried to compile it for Windows - oops!! Int was now 32 bits long, rather than 16! When I tried to compile on Linux, discovered gcc had very different rules for bit fields allocation than Microsoft C (6.0!). I had to resort to macro tricks to make it portable...

Joe Pineda
Yes, you are 100% right. The original file is created by a Delphi application and there are some language features which assist in preventing common problems. (Padding and alignment can be controlled for example) But I will think about portability... Thanks.
DR
A: 

Here is a link to read byte using a ByteBuffer (Java NIO)

http://exampledepot.com/egs/java.nio/ReadChannel.html

Javamann
+1  A: 

As other people mention DataInputStream and Buffers are probably the low-level API's you are after for dealing with binary data in java.

However you probably want something like Construct (wiki page has good examples too: http://en.wikipedia.org/wiki/Construct_(python_library)), but for Java.

I don't know of any (Java versions) off hand, but taking that approach (declaratively specifying the struct in code) would probably be the right way to go. With a suitable fluent interface in Java it would probably be quite similar to a DSL.

EDIT: bit of googling reveals this:

http://javolution.org/api/javolution/io/Struct.html

Which might be the kind of thing you are looking for. I have no idea whether it works or is any good, but it looks like a sensible place to start.

John Montgomery
+1  A: 

I've written up a technique to do this sort of thing in java - similar to the old C-like idiom of reading bit-fields. Note it is just a start but could be expanded upon.

here

+2  A: 

If you would be using Preon, then all you would have to do is this:

public class Header {
    @BoundNumber int version;
    @BoundNumber byte type;
    @BoundNumber int beginOfData;
    @BoundString(size="15") String id;
}

Once you have this, you create Codec using a single line:

Codec<Header> codec = Codecs.create(Header.class);

And you use the Codec like this:

Header header = Codecs.decode(codec, file);
Wilfred Springer
+1  A: 

I would create an object that wraps around a ByteBuffer representation of the data and provide getters to read directly from the buffer. In this way, you avoid copying data from the buffer to primitive types. Furthermore, you could use a MappedByteBuffer to get the byte buffer. If your binary data is complex, you can model it using classes and give each class a sliced version of your buffer.

class SomeHeader {
    private final ByteBuffer buf;
    SomeHeader( ByteBuffer fileBuffer){
       // you may need to set limits accordingly before
       // fileBuffer.limit(...)
       this.buf = fileBuffer.slice();
       // you may need to skip the sliced region
       // fileBuffer.position(endPos)
    }
    public short getVersion(){
        return buf.getShort(POSITION_OF_VERSION_IN_BUFFER);
    }
}

Also useful are the methods for reading unsigned values from byte buffers.

HTH

anonymous