What are the best practices for handling strings in C++? I'm wondering especially how to handle the following cases:
File input/output of text and XML files, which may be written in different encodings. What is the recommended way of handling this, and how to retrieve the values? I guess, a XML node may contain UTF-16 text, and then I have to work with it somehow.
How to handle
char*
strings. After all, this can be unsigned or not, and I wonder how I determine what encoding they use (ANSI?), and how to convert to UTF-8? Is there any recommended reading on this, where the basic guarantees of C/C++ about strings are documented?String algorithms for UTF-8 etc. strings -- computing the length, parsing, etc. How is this done best?
What character type is really portable? I've learned that
wchar_t
can be anything from 8-32 bit wide, making it no good choice if I want to be consistent across platforms (especially when moving data between different platforms - this seems to be a problem, as described for example in EASTL, look at item #13)
At the moment, I'm using std::string
everywhere, with a small helper utility to convert to UTF-16 when calling Unicode-APIs, but I'm pretty sure that this is not really the best way. Using something like Qt's QString
or the ICU String class seems to be right, but I wonder whether there is a more lightweight approach (i.e. if my char
strings are ANSI encoded, and the subset of ANSI that is used is equal to UFT-8, then I can easily treat the data as UTF-8 and provide converters from/to UTF-8, and I'm done, as I can store it in std::string
, unless there are problems with this approach).