views:

914

answers:

1

Is it possible to use libxml with unicode?

For example the xmlParseDoc function takes an xmlChar

xmlChar has the following definition:

typedef unsigned char xmlChar;

I would like for libxml to interpret all as 2 byte chars.

I have a feeling that the following would not work properly with the lib:

typedef unsigned short xmlChar;

Note: I'm not talking about when it actually reads/writes the xml encoding. I know that supports unicode. What I want is for the interface into the lib to be with unicode strings wstring instead of normal strings.

+2  A: 

I found the answer in a link provided by @Mitch Wheat

You cannot re-define xmlchar to be an unsigned short. However if you encode your strings as UTF-8 then xmlChar will properly handle unicode.

You can convert a string in windows to UTF8 via calling WideCharToMultiByte with a parameter of CP_UTF8

Brian R. Bondy