I need to modify my program to accept Unicode, which may come from any of UTF-8 and the various UTF-16 and UTF-32 encodings. I don't really know much about Unicode (though I've read Joel Spolsky's article and the Wikipedia page).
Right now I'm using an std::istream
and reading my input char
by char
, and then storing (when necessary) in an std::string
. I'd like to
- modify this (with as little effort) to support the above encodings, and
- figure out how to test the above encodings (I'm kinda white-bread American, and don't really know how to even make a sample text file in another encoding), and ideally
- do this in a cross-platform way.
Also, if possible, I'd like to conserve space as much as possible (so if we don't need more than a byte/character, we don't use it). From what I understand, this means storing in UTF-8, which is fine, but I don't know of a standard string that does this (from what I understand, wchar_t
has implementation-defined size and encoding).