views:

599

answers:

1

I am currently building a C++ application that communicate via socket to a C# application. My C++ app sends wchar_t* via socket.

Here is an overview of what is send :

<!-- Normal xml file--

Here is what I receive on the other side (I do a stream.read to a byte array and use UTF8Encoding.GetString() to convert the byte array to a readable string)

<\0!\0-\0-\0 \0N\0o\0r\0m\0a\0l\0 \0x\0m\0l\0 \0f\0i\0l\0e\0-\0-

Is it a marshalling problem? What do you say? Why is it 0 extended and why unicode caracter doesn't appear on the C# side?

+5  A: 

Looks like it's sending UTF-16, not UTF-8, which makes sense - wchar_t is basically a 16-bit type (in Windows), and you're sending it down "raw" as far as I can tell. I suggest that if you're going to convert the data into an XDocument or XmlDocument, you do it with the binary data - the framework knows how to autodetect UTF-16 for XML files (IIRC).

You'll potentially have problems if the XML declaration declares it to be UTF-8 when it's really UTF-16 though.

Alternatively, use suitable encoding classes on the C++ side to genuinely send UTF-8. This would take extra processing time, but usually save bandwidth if that's a consideration.

Jon Skeet