Hello everyone,
I want to know whether there is quick way to find whether an XML document is correctly encoded in UTF-8 and does not contains any characters which is not allowed in XML UTF-8 encoding.
<?xml version="1.0" encoding="utf-8"?>
thanks in advance, George
EDIT1: here is the content of my XML file, in both text form and in binary form.
http://tinypic.com/view.php?pic=2r2akvr&s=5
I have tried to use tools like xmlstarlet to check, the result is correct (invalid because of out of range of UTF-8), but the error message is not correct, because in my posted link above, there is no char whose value is 0xDFDD. Any ideas?
BTW: I can send the XML file to anyone, but I did not find a way to upload the file as attachment here. If anyone needs this file for analysis, please feel free to let me know.
D:\xmlstarlet-1.0.1-win32\xmlstarlet-1.0.1>xml val a.xml
a.xml:2: parser error : Char 0xDFDD out of allowed range
<URL>student=1砜濏磦</URL>
^
a.xml:2: parser error : Char 0xDFDD out of allowed range
<URL>student=1砜濏磦</URL>
^
a.xml:2: parser error : internal error
<URL>student=1砜濏磦</URL>
^
a.xml:2: parser error : Extra content at the end of the document
<URL>student=1砜濏磦</URL>
^
a.xml - invalid
EDIT2: I have used the tool libxml to check the validation of XML file as well, but met with an error when start this tool. Here is a screen snapshot. Any ideas?
http://tinypic.com/view.php?pic=2ildjpe&s=5
OS is Windows Server 2003 x64.