libxml2 questions about xmlChar*

views:

answers:

libxml2 questions about xmlChar*

I'm using libxml2. All function are working with xmlChar*. I found that xmlChar is an unsigned char.

So I have some questions about how to work with it.

1) For example if I working with utf-16 or utf-32 file how libxml2 process it and returns xmlChar in function? Will I lose some characters then??

2) If I want to do something with this string, should I cast it to char* or wchar_t* and how??

Will I lose some characters?

xmlChar is for handling UTF-8 encoding only.

So, to answer your questions:

No, you won't loose any characters if using UTF-16 or UTF-32. Just use iconv or any other library to encode your UTF-16 or UTF-32 data before passing it to the API.
Do not just "cast" the string. Convert them if needed in some other encoding.

Pablo Santa Cruz 2010-09-24 12:22:40

Thank you but now I have some more questions: How does it work now? Because even if I feed a utf-16 file. Libxml still release unsigned char*. Why and how does it work? The second is How can I coonvert UTF32 or UTF16 to UTF-8. I don't want to use some third-part libraries. I need to do it under unix. I know that windows have function WideCharToMutliByte does unix has something like that? And the last question is how can I convert xmlchar to other encoding and to which one?

Nikita 2010-09-24 12:35:46

Yes. The thing is API is doing some internal convertions. All CALLs are `xmlChar` based, even though the FILES or NETWORK feeds you use to parse the XML is encoded in a different charset. In UNIX, use LIBICONV. It's a pretty common library and if I recall correctly it already bundles with LIBXML2. To convert xmlChar to other encoding, again, use LIBICONV. Redards...

Pablo Santa Cruz 2010-09-24 12:44:00

And one more question. Why did you say that I should first encode UTF-16 before feed it to libxml. I've just tried to do it without converting then I applied xmlCheckUTF8 function to every element which was released from lib xml and it was ok. I guess that unsigned char* is just a number of bytes ...

Nikita 2010-09-24 13:40:51

No. I said (at least I tried to anyway :-) ) that you should encode your UTF-16 data into UTF-8 **before** feeding it to the API if you are getting the UTF-16 encoded data from somewhere else...

Pablo Santa Cruz 2010-09-24 14:07:29

ansaurus

tags:

views:

answers:

libxml2 questions about xmlChar*

related questions