views:

545

answers:

3

I'm attempting to display some text in my program using (say) Windows GDI and some of the unicode characters are displayed as question marks? What is up?

See also: What does it mean when my text is displayed as boxes?

A: 

Basically you have corrupted the text. You are taking Unicode text in one encoding and then have converted it to another encoding without checking that target encoding includes all of the characters in the source text. Having done so you have got a bunch of gibberish.

Ways to do this include:

  1. Treating UTF-8 text as ANSI (without converting into a valid code-page first)
  2. Converting Unicode text into a code-page without checking if the code-page has the right characters in it.
1800 INFORMATION
+1  A: 

It means your Unicode text is getting converted to ANSI text somewhere. Since Unicode characters outside of Latin-1 can't be converted to ANSI, they are converted to question marks. Make sure that your program is compiled with Unicode support on (i.e. the preprocessor symbols UNICODE and _UNICODE are #defined by your project), so that you're always calling the proper Unicode versions of the various Windows functions.

Adam Rosenfield
+1  A: 

In Windows there are 2 common display problems that occur when trying to display Unicode characters:

  1. text sometimes appears as question marks

    • This occurs when Unicode data is converted to an 8-bit character set encoding (or technically multi-byte characters) usually via the system codepage (but other code pages can be specified in the conversion calls). If the target 8-bit character set doesn't included the characters needed, any characters not representable in the target character set get converted to question marks.
  2. text sometimes appears as boxes

    • This is a problem with the font not having the glpyh for a particular character. Boxes show up when there is a mismatch between Unicode characters in the document and those supported by the font. Specifically, the boxes represent characters not supported by the selected font.
Michael Burr