I'm still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or some 16-bit string (implemented as std::wstring). The project is a programming language and environment (like VB, it's a combination of both).
There are a few wishes/constraints:
- It would be cool if it could run on limited hardware, such as small Micro-ATX formats, which basically means limited memory.
- I want the code to run on Windows, Mac and (if resources allow) Linux.
- I'll be using wxWidgets as my GUI layer, but I want the code that interacts with that toolkit confined in a corner of the codebase (I will have non-GUI executables).
- I would like to avoid working with two different kinds of strings when working with user-visible text and with the application's data.
Currently, I'm working with std::string, with the intent of using UTF-8 manipulation functions only when necessary. It requires less memory, and seems to be the direction many applications are going anyway.
If you recommend a 16-bit encoding, which one: UTF-16? UCS-2? Another one?