tags:

views:

51

answers:

1

I'm porting code originally Windows-only to cross-platform friendly code; one particular stumbling block is trying to convert calls to the Windows Unicode function "GetMultiByteString" (and any related functions) to the more portable wchar-based functions. I'm having little success with it, as using wchar results in premature loop terminations when trying to iterate through Unicode strings.

What is the correct way to use wchar to replace GetMultiByteString and any other related Unicode functions?

+1  A: 

You're trying to convert apples into oranges here. MultiByteToWideChar and WideCharToMultiByte convert between specific encodings, UTF-16 <-> a variety of other encodings, including ANSI.

3 problems:

  1. The encoding to which the char <-> wchar_t functions in the C standard library operates is implementation defined. It could translate between UCS-2 and ASCII, or EBDIC, or any number of other codepages. You can't replace the windows functions with these because you can't assume wcstombs and mbstowcs actually are talking about UTF-16, or actually talking about ASCII. Usually the actual encoding they use is UTF-32 on unix boxes.
  2. Unix boxes don't often recognise UTF-16 -- they're all UTF-8 based, if they support unicode at all.
  3. wchar_t is typically 4 bytes on unix boxes, not 2 bytes, so you'd have to check all of your code to ensure that the size of it was never assumed to be 2 bytes.

Simply put, there is no completely portable way of dealing with these kind of things unless you write the code to do the encoding yourself.

If you want to be portable, you need to define a typedef or something so that your application uses wchar_t on windows, and char on everything else. You then must assume that UTF-16 is being used on Windows boxes, and UTF-8 is being used on unix boxes.

OR: You have to use a library, such as ICU.

Billy ONeal
Now where exactly came `MultiByteToWideChar()` and `WideCharToMultiByte()` in? (The question is about `GetMultiByteString()`.)
sbi
@sbi: GetMultiByteString is not a Win32 function. I therefore assumed he was talking about the two multibyte functions in the API.
Billy ONeal
@billy - i thought GetMultiByteString WAS a Win32 only function. the question will likely apply to any multibyte Win32 functions i come across in this project, tho.
Alex Rosario
Billy ONeal