tags:

views:

1489

answers:

6

Structs seem like a useful way to parse a binary blob of data (ie a file or network packet). This is fine and dandy until you have variable size arrays in the blob. For instance:

struct nodeheader{
        int flags;
        int data_size;
        char data[];
};

This allows me to find the last data character:

nodeheader b;
cout << b.data[b.data_size-1];

Problem being, I want to have multiple variable length arrays:

struct nodeheader{
    int friend_size;
    int data_size;
    char data[];
    char friend[];
};

I'm not manually allocating these structures. I have a file like so:

char file_data[1024];
nodeheader* node = &(file_data[10]);

As I'm trying to parse a binary file (more specifically a class file). I've written an implementation in Java (which was my class assignment), no I'm doing a personal version in C++ and was hoping to get away without having to write 100 lines of code. Any ideas?

Thanks, Stefan

+2  A: 

You cannot have multiple variable sized arrays. How should the compiler at compile time know where friend[] is located? The location of friend depends on the size of data[] and the size of data is unknown at compile time.

Mecki
A great point, and I understand it, I guess my question is still:Is there a good way to do this? There are tons of binary files out there and it's a pain to write hundreds of lines of code that are simply an implementation of the header.
Stefan Mai
Actually, since structs have padding, you can only use it to parse packed binary data if you tell the compiler to not use padding. In GCC you do this by using __attribute__((packed)); just search for this on Google.
Mecki
Regarding how to do it, please ask a new question (so users with similar problems can find the replies), and I'm pleased to present you with ready to use code. Just provide some sample data and how it should looked once it is parsed.
Mecki
+1  A: 

You can't - at least not in the simple way that you're attempting. The unsized array at the end of a structure is basically an offset to the end of the structure, with no build-in way to find the end.

All the fields are converted to numeric offsets at compile time, so they need to be calculable at that time.

Douglas Leeder
+3  A: 

This is a very dangerous construct, and I'd advise against it. You can only include a variable-length array in a struct when it is the LAST element, and when you do so, you have to make sure you allocate enough memory, e.g.:

nodeheader *nh = (nodeheader *)malloc(sizeof(nodeheader) + max_data_size);

What you want to do is just use regular dynamically allocated arrays:

struct nodeheader
{
  char *data;
  size_t data_size;
  char *friend;
  size_t friend_size;
};

nodeheader AllocNodeHeader(size_t data_size, size_t friend_size)
{
  nodeheader nh;
  nh.data = (char *)malloc(data_size);  // check for NULL return
  nh.data_size = data_size;
  nh.friend = (char *)malloc(friend_size);  // check for NULL return
  nh.friend_size = friend_size;

  return nh;
}

void FreeNodeHeader(nodeheader *nh)
{
  free(nh->data);
  nh->data = NULL;
  free(nh->friend);
  nh->friend = NULL;
}
Adam Rosenfield
A: 

(Was 'Use std::vector')

Edit:

On reading feedback, I suppose I should expand my answer. You can effectively fit two variable length arrays in your structure as follows, and the storage will be freed for you automatically when file_data goes out of scope:

struct nodeheader {
    std::vector<unsigned char> data;
    std::vector<unsigned char> friend_buf; // 'friend' is a keyword!
    // etc...
};

nodeheader file_data;

Now file_data.data.size(), etc gives you the length and and &file_data.data[0] gives you a raw pointer to the data if you need it.

You'll have to fill file data from the file piecemeal - read the length of each buffer, call resize() on the destination vector, then read in the data. (There are ways to do this slightly more efficiently. In the context of disk file I/O, I'm assuming it doesn't matter).

Incidentally OP's technique is incorrect even for his 'fine and dandy' cases, e.g. with only one VLA at the end.

char file_data[1024];
nodeheader* node = &(file_data[10]);

There's no guarantee that file_data is properly aligned for the nodeheader type. Prefer to obtain file_data by malloc() - which guarantees to return a pointer aligned for any type - or else (better) declare the buffer to be of the correct type in the first place:

struct biggestnodeheader {
    int flags;
    int data_size;
    char data[ENOUGH_SPACE_FOR_LARGEST_HEADER_I_EVER_NEED];
};

biggestnodeheader file_data;
// etc...
fizzer
I agree that "use `std::vector<>`" is often the correct answer to questions regarding binary data handling, but could you please elaborate on how it would make the questioner's life any easier in his particular case?
Johann Gerell
A: 

For what you are doing you need an encoder/decoder for the format. The decoder takes the raw data and fills out your structure (in your case allocating space for the copy of each section of the data), and the decoder writes raw binary.

Greg Rogers
A: 

The answers so far are seriously over-complicating a simple problem. Mecki is right about why it can't be done the way you are trying to do it, however you can do it very similarly:

struct nodeheader
{
    int friend_size;
    int data_size;
};

struct nodefile
{
    nodeheader *header;
    char *data;
    char *friend;
};

char file_data[1024];

// .. file in file_data ..

nodefile file;
file.header = (nodeheader *)&file_data[10];
file.data = (char *)&file.header[1];
file.friend = &file.data[file->header.data_size];
Jim Buck