I'm receiving XML over a network socket. I need to take that XML and load it into the DOM to perform further operations. MSXML requires input strings that are in either UCS-2 or UTF-16 and completely ignores the XML header with the encoding type when loading from a string. It allows the loading of XML fragments, so this makes some sense.
I see two possible ways to handle this problem:
1) Write the file out to disk and load it into MSXML via file paths. The extra disk I/O makes this approach far from preferred.
2) Peak into the XML header to manually detect the encoding and then call MultiByteToWideChar to convert into UTF-16 and specify the code page based on the detected encoding. This approach works OK, but I'd like to push the encoding detection onto MSXML.
Does anybody have any other ideas on how to accomplish this?
I haven't looked at other XML parsers, but would be interested in how non-MSXML DOM parsers accomplish this.
Thanks, Paul