views:

895

answers:

8

Currently, I'm saving and loading some data in C/C++ structs to files by using fread()/fwrite(). This works just fine when working within this one C app (I can recompile whenever the structure changes to update the sizeof() arguments to fread()/fwrite()), but how can I load this file in other programs without knowing in advance the sizeof()s of the C struct?

In particular, I have written this other Java app that visualizes the data contained in that C struct binary file, but I'd like a general solution as to how read that binary file. (Instead of me having to manually put in the sizeof()s in the Java app source whenever the C structure changes...)

I'm thinking of serializing to text or XML of some sort, but I'm not sure where to start with that (how to serialize in C, then how to deserialize in Java and possibly other languages in the future), and if that is advisable here where one member of the struct is a float array that can go upwards of ~50 MB in binary format (and I have hundreds of these data files to read and write).

The C structure is simple (no severe nesting or pointer references) and looks like the following:

struct MyStructure {
    char *title;
    int id;
    int param1;
    int param2;
    float *data;
}

The part that are liable to change the most are the param integers.

What are my options here?

+3  A: 

You could use Java's DataInput/DataOutput format that is well described in the javadoc.

Maurice Perry
A: 

One possibility is creating small XML files with title, ID, params, etc, and then a reference (by filename) to where the float data is contained. Assuming there's nothing special about the float data, and that Java and C are using the same floating point format, you can read that file in with readFloat() of a DataInputStream.

Eddie
+3  A: 

If you have control of both code bases, you should consider using Protocol Buffers.

HTH, Kent

Kent Boogaart
Agreed (but since I run a C# version of protocol-buffers, I might be biased). For info, the main C (not C++) version is, AFAIK, http://code.google.com/p/protobuf-c/
Marc Gravell
+1  A: 

If your structure isn't going to change (much), and your data is in a pretty consistent format, you could just write the values out to a CSV file, or some other plain format.

This can be easily read in Java, and you won't have to worry about serializing to XML. Sometimes going simple is the easiest route.

Andy White
+1  A: 

Take a look at Resin's Hessian/Burlap services. You may not want the whole service, just part of the API and an understanding of the wire protocol.

Josh
A: 

I like the CSV and "Protocol Buffers" answers (though, at a glance, the protocol buffer thing might be very similar to YAML for all I know).

If you need tightly packed records for high volume data, you might consider this:

Create a textual file header describing the current file structure: record sizes (types????) and field names / sizes. Read and parse the header, then use low level binary I/O operations to load up each record's fields, er, object's properties or whatever we are calling it this year.

This gives you the ability to change the strucutre a bit and have it be self-describing, while still allowing you to pack a high volume in a smaller space than XML would allow.

TMTOWTDI, I guess.

Roboprog
For info, protocol buffers is an open standard (created by Google) for tightly packed (binary), interoperable data. It *is* low level, with additional efficiencies like base-128/variant-length packing. Just without the header.
Marc Gravell
+2  A: 

Take a look at JSON. http://www.json.org. If you go to from javascript it's a big help. I don't know how good the java support is though.

Jay
A: 

If:

  • your data is essentially a big array of floats;
  • you are able to test the writing/reading procedure in all the likely environments (=combinations of machines/OS/C compiler) that each end will be running on;
  • performance is important.

then I would probably just keep writing the data from C in the way that you are doing (maybe with a slight amendment -- see below) and turn the problem into how you read that data from Java.

To read the data back in from Java, use a ByteBuffer. Essentially, pull in slabs of bytes from your data, wrap a ByteBuffer around them, and then use the get(), getFloat(), getInt() etc methods. The NIO package also has "wrapper" buffers, e.g. FloatBuffer, which from tests I've done appear to be about 20% faster for reading large numbers of the same type.

Now, one thing you'll have to be careful about is byte ordering. From Java, you need to call order(ByteOrder.LITTLE _ ENDIAN) or order(ByteOrder.BIG _ ENDIAN) on your buffer before you start reading the data. To decide which to use, I'd recommend that at the very start of the stream, you write some known 16-byte value (e.g. 255 = 0x00ff). Then from Java, pull out these two bytes and check the order (0xff, 0x00 or 0x00, 0xff) to see whether you have little or big endian.

Neil Coffey