views:

3467

answers:

4

In general, what is the best way of storing binary data in C++? The options, as far as I can tell, pretty much boil down to using strings or vector<char>s. (I'll omit the possibility of char*s and malloc()s since I'm referring specifically to C++).

Usually I just use a string, however I'm not sure if there are overheads I'm missing, or conversions that STL does internally that could mess with the sanity of binary data. Does anyone have any pointers (har) on this? Suggestions or preferences one way or another?

+12  A: 

vector of char is nice because the memory is contiguious. Therefore you can use it with a lot of C API's such as berkley sockets or file APIs. You can do the following, for example:

  std::vector<char> vect;
  ...
  send(sock, &vect[0], vect.size());

and it will work fine.

You can essentially treat it just like any other dynamically allocated char buffer. You can scan up and down looking for magic numbers or patters. You can parse it partially in place. For receiving from a socket you can very easily resize it to append more data.

The downside is resizing is not terribly efficient (resize or preallocate prudently) and deletion from the front of the array will also be very ineficient. If you need to, say, pop just one or two chars at a time off the front of the data structure very frequently, copying to a deque before this processing may be an option. This costs you a copy and deque memory isn't contiguous, so you can't just pass a pointer to a C API.

Bottom line, learn about the data structures and their tradeoffs before diving in, however vector of char is typically what I see used in general practice.

Doug T.
good answer. for the learning part: i found a nice picture showing the use of containers some time ago, and embedded it into this answer: http://stackoverflow.com/questions/366432/extending-stdlist#366710
Johannes Schaub - litb
+1  A: 

I use std::string for this too, and have never had a problem with it.

One "pointer," which I just received a sharp reminder of in a piece of code yesterday: when creating a string from a block of binary data, use the std::string(startIter, endIter) constructor form, not the std::string(ptr, offset, length) form -- the latter makes the assumption that the pointer points to a C-style string, and ignores anything after the first zero character (it copies "up to" the specified length, not length characters).

Head Geek
j_random_hacker
This prompted me to look it up again, and it seems that there *is* no std::string(char *ptr, offset, length) constructor. The constructor that takes offset and length requires an std::string as the first parameter, so it was auto-constructing a string from the bytes, which is what truncated it.
Head Geek
You're right. I'm sorry, I meant the std::string(char *ptr, size_t length) ctor should copy all bytes.
j_random_hacker
Wow, just checked and sure enough, std::string's 1-arg ctor from char * is not marked "explicit"! I didn't realise. Having that implicit char * -> std::string conversion certainly opens up a can of worms, as you discovered. Good debugging there Head Geek. :)
j_random_hacker
+1  A: 

You should certainly be using some container of char, but the container you want to use depends on your application.

Chars have several properties that make them useful for holding binary data: the standard disallows any "padding" for a char datatype, which is important since it means that you won't get garbage in your the binary layout. Each char is also guaranteed to be exactly one byte, making it the only plain old datatype (POD) with set width (all others are specified in terms of upper and/or lower bounds).

The discussion on appropriate stl container with which to store the chars is handled by well by Doug above. Which one you need depends entirely on your use case. If you are just holding a block of data you iterate through, without any special lookup, append/remove, or splice needs, I would prefer vector, which makes your intentions more clear than std::string, which many libraries and functions will assume holds a null-terminated c-style string.

Todd Gardner
+2  A: 

The biggest problem with std::string is that the current standard doesn't guarantee that its underlying storage is contiguous. However, there are no known STL implementations where string is not contiguous, so in practice it probably won't fail. In fact, the new C++0x standard is going to fix this problem, by mandating that std::string uses a contiguous buffer, such as std::vector.

Another argument against string is that its name suggests that it contains a character string, not a binary buffer, which may cause confusion to those who read the code.

That said, I recommend vector as well.