VerQueryValue and multi codepage Unicode characters

In our application we use VerQueryValue() API call to fetch version info such as ProductName etc. For some applications running on a machine in Traditional Chinese (code page 950), the ProductName which has Unicode sequences that span multiple code pages, some characters are not translated properly. For instance,in the sequence below,

51 00 51 00 6F 8F F6 4E A1 7B 06 74 Some characters are returned as invalid Unicode 0x003f (question mark)

In the above sequence, the Unicode '8F 6F' is not picked up & converted properly by the WinAPI call and is just filled with the invalid Unicode '00 3F' - since '8F 6F' is present in codepage 936 only (ie., Simplified Chinese)

The .exe has just one translation table - as '\StringFileInfo\080404B0' - which refers to a language ID of '804' for Traditional Chinese only

How should one handle such cases - where the ProductName refers to Unicode from both 936 and 950 even though the translation table has one entry only ? Is there any other API call to use ?

Also, if I were to right-click on the exe and view 'details' tab, it shows the Productname correctly ! So it appears Microsoft uses a different API call or somehow handle this correctly. I need to know how it so done.

Thanks in advance,

Venkat

It looks somewhat waierd to have contents compatible with codepage1 only in a block marked as codepage2. This is the source of your problem.

The best way to handle multi-codepages issues is obviously to turn your app to a Unicode-aware application. There will be no conversion to any codepages anymore, which will make everyone happy.

The LANGID (0804) is only an indication about the language of the contents in the block. If a version info has several blocks, you may program your app to lookup the block in the language of your user.

When you call VerQueryValue() in an ANSI application, this LANGID is not taken into account when converting the Unicode contents to ANSI: You're ANSI, so Windows assume you only understand the machine's default ANSI codepage.

Note about display in console

Beware of the console! It's an old creature that is not totally Unicode-aware. It is based on codepages. Therefore, you should expect display problems which can't be addressed. Even worse: It uses its own codepage (called OEM codepage) which may be different that the usual ANSI codepage (Although for East Asian languages, OEM codepage = ANSI codepage).

HTH.

Thanks !I tried compiling the code in VC++ with "Use Unicode Character Set", so that the UNICODE and _UNICODE macros are #define'd. Still it does not work properly. Also, there is just a single language block in version info and it has a mixture of multi-codepage sequences - which like you have said is the root cause. But I only wonder how MS displays the characters properly in the 'Properties dialog' and would like to mimic the same.

Venkat 2009-10-19 11:35:37

If by MS, you mean Windows Explorer, then using Unicode is the solution they use! What "does not work properly" when compiling your program as Unicode?

Serge - appTranslator 2009-10-19 13:15:29

Yes, I mean Winodws Explorer; When I mean by "it does not work properly", is that some characters do not show up in the console while the same do in properties dialog of the Explorer

Venkat 2009-10-20 09:38:00

I replied in the answer.

Serge - appTranslator 2009-10-20 16:15:08

ansaurus

tags:

views:

answers:

VerQueryValue and multi codepage Unicode characters

related questions