This is more an MBCS question than a Unicode question. I need to create an API that returns a list of structs that each instance holds a Unicode character as one of its members. This is in .NET so you'd think I'd want UTF-16, but then for Asian characters, there'd like be two characters required. What's the best practice when returning Unicode characters?
- Use an array of 2 UTF-16 chars - Test the 1st char to see if it's surrogate, have a count?
- Ignore the surrogate issue and leave it to the caller to figure out the actual glyph encoding spans structs?
- Use a string instead so I don't care if it's one or two chars in length?
- Use UTF-32
What do people normally do for UTF-8? I'm guessing they never deal with individual characters and everything is held in a string (for example, searching for a character in a string is really done by looking for a sub-string). Maybe it's the C++ programmer in me but a string seems so heavy handed.
I think I'm going to do #3. What have others done?