views:

1431

answers:

3

Up until now I have been using std::string in my C++ applications for embedded system (routers, switches, telco gear, etc.).

For the next project, I am considering to switch from std::string to std::wstring for Unicode support. This would, for example, allow end-users to use Chinese characters in the command line interface (CLI).

What complications / headaches / surprises should I expect? What, for example, if I use a third-party library which still uses std::string?

Since support for international strings isn't that strong of a requirement for the type of embedded systems that I work on, I would only do it if it isn't going to cause major headaches.

+1  A: 

Note that many communications protocols require 8-bit characters (or 7-bit characters, or other varieties), so you will often need to translate between your internal wchar_t/wstring data and external encodings.

UTF-8 encoding is useful when you need to have an 8-bit representation of Unicode characters. (See http://stackoverflow.com/questions/134371/how-do-you-write-code-that-is-safe-for-utf-8 for some more info.) But note that you may need to support other encodings.

More and more third-party libraries are supporting Unicode, but there are still plenty that don't.

I can't really tell you whether it is worth the headaches. It depends on what your requirements are. If you are starting from scratch, then it will be easier to start with std::wstring than converting from std::string to std::wstring later.

Kristopher Johnson
Right. You can use string for UTF-8, and English will be represented exactly in the same way as in ASCII.
Lev
+1  A: 

std::wstring is a good choice for holding Unicode strings on Windows, but not on most other platforms, and ceirtanly not for a portable code. Better try to stick with std::string and UTF-8.

Nemanja Trifunovic
Really? Could you elaborate? I thought the STL library was very portable. -- Cayle.
Cayle Spandon
STL is portable, but C++ in general is Unicode-agnostic at this point and only on certain platforms you can assume that wstring contains UTF-16 encoded strings (Windows). On other platforms it may be UTF32 (Linux) or even dependent on environment settings (Solaris).
Nemanja Trifunovic
Thanks for the clarification!
Cayle Spandon
+1  A: 

You might get some headache because of the fact that the C++ standard dictates that wide-streams are required to convert double-byte characters to single-byte when writing to a file, and how this conversion is done is implementation-dependent.

Pukku
This is with the default local. This behavior can be modified by setting an appropriate property in the local associated with the stream.
Martin York
+1 Quoted answer in related question: http://stackoverflow.com/questions/390977/how-to-readstore-unicode-with-stl-strings-and-streams#391052
stukelly