I've been writing a binary version of iostreams. It essentially allows you to write binary files, but gives you much control over the format of the file. Example usage:
my_file << binary::u32le << my_int << binary::u16le << my_string;
Would write my_int as a unsigned 32-bit integer, and my_string as a length-prefixed string (where the prefix is u16le.) To read the file back, you would flip the arrows. Works great. However, I hit a bump in the design, and I'm still on the fence about it. So, time to ask SO. (We make a couple of assumptions, such as 8-bit bytes, 2s-complement ints, and IEEE floats at the moment.)
iostreams, under the hood, use streambufs. It's a fantastic design really -- iostreams code the serialization of an 'int
' into text, and let the underlying streambuf handle the rest. Thus, you get cout, fstreams, stringstreams, etc. All of these, both the iostreams and the streambufs, are templated, usually on char, but sometimes also as a wchar. My data, however, is a byte stream, which best represented by 'unsigned char
'.
My first attempts were to template the classes based on unsigned char
. std::basic_string
templates well enough, but streambuf
does not. I ran into several problems with a class named codecvt
, which I could never get to follow the unsigned char
theme. This raises two questions:
1) Why is a streambuf responsible for such things? It seems like code-conversions lie way out of a streambuf's responsibility -- streambufs should take a stream, and buffer data to/from it. Nothing more. Something as high level as code conversions feels like it should belong in iostreams.
Since I couldn't get the templated streambufs to work with unsigned char, I went back to char, and merely casted data between char/unsigned char. I tried to minimize the number of casts, for obvious reasons. Most of the data basically winds up in a read() or write() function, which then invoke the underlying streambuf. (And use a cast in the process.) The read function is basically:
size_t read(unsigned char *buffer, size_t size)
{
size_t ret;
ret = stream()->sgetn(reinterpret_cast<char *>(buffer), size);
// deal with ret for return size, eof, errors, etc.
...
}
Good solution, bad solution?
The first two questions indicate that more info is needed. First, projects such as boost::serialization were looked at, but they exist at a higher level, in that they define their own binary format. This is more for reading/writing at a lower level, where it is wished to define the format, or the format is already defined, or the bulk metadata is not required or desired.
Second, some have asked about the binary::u32le
modifier. It is an instantiation of a class that holds the desired endianness and width, at the moment, perhaps signed-ness in the future. The stream holds a copy of the last-passed instance of that class, and used that in serialization. This was a bit of a workaround, I orginally tried overloading the << operator thusly:
bostream &operator << (uint8_t n);
bostream &operator << (uint16_t n);
bostream &operator << (uint32_t n);
bostream &operator << (uint64_t n);
However at the time, this didn't seem to work. I had several problems with ambiguous function call. This was especially true of constants, although you could, as one poster suggested, cast or merely declare it as a const <type>
. I seem to remember that there was some other larger problem however.