views:

83

answers:

4

I have a function void write<typename T>(const T&) which is implemented in terms of writing the T object to an ostream, and a matching function T read<typename T>() that reads a T from an istream. I am basically using iostreams as a plain text serialisation format, which obviously works fine for most built-in types, although I'm not sure how to effectively handle std::strings just yet.

I'd like to be able to write out a sequence of objects too, eg void write<typename T>(const std::vector<T>&) or an iterator based equivalent (although in practice, it would always be used with a vector). However, while writing an overload that iterates over the elements and writes them out is easy enough to do, this doesn't add enough information to allow the matching read operation to know how each element is delimited, which is essentially the same problem that I have with a single std::string.

Is there a single approach that can work for all basic types and std::string? Or perhaps I can get away with 2 overloads, one for numerical types, and one for strings? (Either using different delimiters or the string using a delimiter escaping mechanism, perhaps.)

EDIT: I appreciate the often sensible tendency when confronted with questions like this is to say, "you don't want to do that" and to suggest a better approach, but I would really like suggestions that relate directly to what I asked, rather than what you believe I should have asked instead. :)

+1  A: 

A general-purpose serialisation framework is hard, and the built-in features of the iostream library are really not up to it - even dealing with strings satisfactorily is quite difficult. I suggest you either sit down and design the framework from scratch, ignoring iostreams (which then become an implementation detail), or (more realistically) use an existing library, or at least an existing format, such as XML.

anon
I don't need a full serialisation framework in this case, just the ability to read and write these individual types and sequences of homogeneous types.
Kylotan
@Kylotan If you want to press ahead, I'd sit down and think about how you are going to deal with strings, which in fact are a kind of sequence of homogeneous types. You can worry about arrays etc. later.
anon
Fortunately (or unfortunately, depending on how you look at it) I have a special case for single strings that will work - I can simply read the whole stream and use that.
Kylotan
So a string has to be the last thing in the file, because it will consume the rest of the file?
Mike DeSimone
That would be a significant problem if I was working with files. :)
Kylotan
A: 

Basically, you will have to create a file format. When you're restricted to built-ins, strings, and sequences of those, you could use whitespace as delimiters, write strings wrapped in " (escaping any " - and then \, too - occurring within the streams themselves), and pick anything that isn't used for streaming built-in types as sequence delimiter. It might be helpful to store the size of a sequence, too.

For example,

5 1.4 "a string containing \" and \\" { 3 "blah" "blubb" "frgl" } { 2 42 21 }

might be the serialization of an int (5), a float (1.4), a string ("a string containing " and \"), a sequence of 3 strings ("blah", "blubb", and "frgl"), and a sequence of 2 ints (42 and 21).

Alternatively you could do as Neil suggests in his comment and treat strings as sequences of characters:

{ 27 'a' ' ' 's' 't' 'r' 'i' 'n' 'g' ' ' 'c' 'o' 'n' 't' 'a' 'i' 'n' 'i' 'n' 'g' ' ' '"' ' ' 'a' 'n' 'd' ' ' '\' }

sbi
I don't need to mix types within one read or write operation so that part isn't a problem. Each operation knows exactly what type it will be dealing with so I just need to handle the variable length issue. Unfortunately that approach to strings won't work well for me since it breaks the human readability and pushes it towards an arbitrary binary format.
Kylotan
@Kylotan: Well, if I understood your comments to Mike and me correctly, then it's even easier, since you won't need to wrap sequences in delimiters. You'd only have to write `42 21` (or `2 42 21`, if you prefer to store the number of objects) for storing 2 integers.
sbi
That's still delimited, just by whitespace. That would work well for numeric types but doesn't extend to strings which often contain whitespace. That's what the 3rd paragraph of my question is referring to.
Kylotan
@Kylotan: Which is why I introduced wrapping of strings in `"`.
sbi
A: 

If you want to avoid escaping strings, you can look at how ASN.1 does things. It's overkill for your stated requirements: strings, fundamental types and arrays of these things, but the principle is that the stream contains unambiguous length information. Therefore nothing needs to be escaped.

For a very simple equivalent, you could output a uint32_t as "ui4" followed by 4 bytes of data, a int8_t as "si1" followed by 1 byte of data, an IEEE float as "f4", IEEE double as "f8", and so on. Use some additional modifier for arrays: "a134ui4" followed by 536 bytes of data. Note that arbitrary lengths need to be terminated, whereas bounded lengths like the number of bytes in the following integer can be fixed size (one of the reasons ASN.1 is more than you need is that it uses arbitrary lengths for everything). A string could then either be a<len>ui1 or some abbreviation like s<len>:. The reader is very simple indeed.

This has obvious drawbacks: the size and representation of types must be independent of platform, and the output is neither human readable nor particularly compressed.

You can make it mostly human-readable, though with ASCII instead of binary representation of arithmetic types (careful with arrays: you may want to calculate the length of the whole array before outputting any of it, or you may use a separator and a terminator since there's no need for character escapes), and by optionally adding a big fat human-visible separator, that the deserializer ignores. For example, s16:hello, worlds12:||s12:hello, world is considerably easier to read than s16:hello, worlds12:s12:hello, world. Just beware when reading that what looks like a separator sequence might not actually be one, and you have to avoid falling into traps like assuming s5:hello|| in the middle of the code means there's a string 5 chars long: it might be part of s15:hello||s5:hello||.

Unless you have very tight constraints on code size, it's probably easier to use a general-purpose serializer off the shelf than it is to write a specialized one. Reading simple XML with SAX isn't difficult. That said, everyone and his dog has written "finally, the serializer/parser/whatever that will save us ever hand-coding a serializer/parser/whatever ever again", with greater or lesser success.

Steve Jessop
A: 

You may consider using boost::spirit, which simplifies parsing of basic types from arbitrary input streams.

Artem