tags:

views:

161

answers:

1

Hi.

I'm writing out some xml from C# using the .net framework's XmlTextWriter. This works ok. Some of the strings I write out contain the character value 5 (note I don't mean the character '5', but I mean the ascii value 5).

Now, I understand from the xml specification that this character is illegal in xml. However, I don't care if it's illegal, I want it in my xml (non-conforming) document. This is so that I can write a string that can potentially contain some binary data to the document.

Ok, so System.Xml.XmlTextWriter will write these illegal xml characters ok, and encodes it in the xml as "&#5x;". But then, I want to read them in a C++ app by using MSXML2.SAXXMLReader.6.0. This parser raises a fatalError when it encounters one of these characters.

I've tried modifiying some of the properties of the parser to get it to work. It was my understanding that IE used this parser internally, and I can load the illegal xml with IE ok. So, how does IE manage to parse it when I can't.

Am I missing something? Does IE use a different parser. Is there a way I can get the MSXML2.SAXXMLReader.6.0 parser to work? Will I need to use a different parser (if so, can you recommend one that has the source code available so I can fix it up if it doesn't do what I want)?

There is a property I can set on the .Net parser to allow these illegal characters to be parsed. I guess I'm looking for an equivalent I can use from C++ with the SAX parser. http://msdn.microsoft.com/en-us/library/system.xml.xmlwritersettings.checkcharacters.aspx

Thanks a lot,
-Scott

NOTE I don't believe a CDATA section would allow this character to be encoded. See here: http://msdn.microsoft.com/en-us/library/ms256076(VS.85).aspx
and, even if it did. I don't want to use CDATA sections, I want to use the character in an attribute value. I also realize I could base64 encode it, but I don't want to do that either... I want to break the law, I want to be able to parse illegal xml.

+1  A: 

No, it is impossible to parse control characters in XML.

To be precise, this would make your documents something-other-than-XML documents.

This is a hard-wired part of the spec. If you want to parse illegal characters, you will have to write your own NON-COMPLIANT parser.

As per:

http://lists.xml.org/archives/xml-dev/199804/msg00502.html

John Gietzen