views:

51

answers:

2

The title is a bit confusing, so i will explain a bit more using examples. Just a note: i am parsing a file format. Say we have a structure like this:

struct example
{
   typeA a;
   typeB b;
   typeX x;
   typeY y;
   typeZ z;
};

So far its ok. Now the problem is, that typeX, typeY and typeZ can vary in size. Depending on flags in the file header (metadata) they can be either two or four bytes large. Another thing is, that there are several more structures like this (about 40). Each of them use the typeX, typeY, typeZ. Some all of them, some just one or two. And finally, most of them are optional, so there might be just four or five structures used, or 20 or 30...

I would like to know if anyone has some idea how to store such a varying set of data. I thought about using templates, but dont know if its the right way.

EDIT: to clarify more: memory is not a big issue, so i probably can afford wasting a bit of space. If typeX is four bytes then it is so for all structures. Hovewer they are not synced, so typeX can be 4 bytes, typeZ can be 2. Most structures might be used multiple times, so there can be 50 example1 structures, 10 example2 structures etc.

+1  A: 

One factor that you haven't mentioned, since you are parsing a file, is whether the software is intended to be CPU-agnostic. Some CPUs are little-endian, which means that an item of data is stored with the least-significant byte first and the most-significant byte last. Other CPUs are big-endian and the byte order is the other way around. When you are parsing a file you have to take this into account if the file might have been written using a CPU with the opposite endianness. The robust way to do this is to define the endianness that the file format requires, and then read the file one byte at a time and construct the data using appropriate shift operators, e.g.

intVal = (buffer[0] << 24) | (buffer[1] << 16) | (buffer[2] << 8) | buffer[3];

So you see that reading the file directly into a struct is probably not a good idea.

You should really treat the file as a stream (which is what it is) and define stream operations to transfer data into your internal memory structures one item at a time.

If you accept this, then the file format becomes decoupled from your internal memory structures. You can then store the data internally however you like. In your case it sounds like an ideal application for polymorphism. You can declare a subclass for each variant of typeX/Y/Z. Or you could use a single class and let it allocate a variable amount of memory in its constructor. It all depends on your circumstances. The key is to decouple the file format from your internal memory structures.

Ian Goldby
I have a class for reading the file which solves all the endian-related issues for me.
PeterK
My worry was that you were trying to map a struct directly onto the file. As long as you are not doing this, then I don't think it matters how you answer your question. Since memory is not a problem, and if *all* occurrences of typeX *in a particular file* are the same size (?), I would just make it the maximum size always. The fact that the data from the file might not need the entire size in memory becomes irrelevant. Don't make it harder for yourself than you have to!
Ian Goldby
@Ian Goldby: good point, but still i am curious if anyone can come up with something nice that wont waste space.
PeterK
+2  A: 

The issue for me isn't so much dealing with allocating some space, in concept we can do this:

byte *pA = new byte[the size this time];

but rather what you do with these typeA objects. What does

pA->getValue()

return? Is the intent that it's always, say, a 32 bit numeric? Or do we really have

pA->get16bitValue()

in some cases and

pA->get32bitValue()

in others?

I'd be seeking a way to encapsulate that difference first, and the way of doing that very much depends on how you use the values. The storage problem is probably solvable with a bit of dynamic allocation. It's the actual beahviour of typeA that I see as tricky.

djna
Good point. The value is always an unsigned number. It is always an index to another structure which can have more than 64K entries (hence the requirement for 2/4 byte size). So it seems i can always look at the number as if it was an unsigned 32 bit number.
PeterK