Looking at the unicode standard, they recommend to use plain char
s for storing UTF-8 encoded strings. Does this work as expected with C++ and the basic std::string
, or do cases exist in which the UTF-8 encoding can create problems?
For example, when computing the length, it may not be identical to the number of bytes - how is this supposed to be handled? Reading the standard, I'm probably fine using a char
array for storage, but I'll still need to write functions like strlen
etc. on my own, which work on encoded text, cause as far as I understand the problem, the standard routines are either ASCII only, or expect wide literals (16bit or more), which are not recommended by the unicode standard. So far, the best source I found about the encoding stuff is a post on Joel's on Software, but it does not explain what we poor C++ developer should use :)