views:

276

answers:

2

Hi,

I've got an XML doc to deal with that contains attributes like:

<action name="foo -> bar">

If I make a simple:

XmlDocument doc = new XmlDocument();
doc.Load(stInPath);
doc.Save(stOutPath);

The attribute string is escaped:

<action name="foo -&gt; bar">

Which is the very thing I'd want to prevent.

Do you know any way to do this (except than making a whole find&replace on the xml file afterward)?

Edit: It seems it's a legit behaviour, and that I don't have to worry about this (see Jon Skeet's answer)

+3  A: 

Why do you need it not to apply that escaping?

Any normal parser should then apply the appropriate "unescaping" when it parses it. It sounds like you're trying to test the resulting XML document as a plain-text document, which is rarely a good idea. XML documents should almost always be fed to XML parsers in the next step, at which point this isn't an issue.

I don't know of any way of preventing the .NET XML libraries from doing this, and I'd be somewhat surprised if they had such a facility.

Jon Skeet
I'm indeed reading the xml file in a text editor (it's supposed to be human readable, isn't it?)Well, it's then possible I'm seeing a problem where there's not at all. Thanks for your answer.
Vinzz
@Vinzz: Yes, XML is supposed to be human-comprehensible. But it's still *not* supposed to be treated as plain text. Don't let the fact that you can open it in a text editor distract you.
Tomalak
+3  A: 

Which is the very thing I'd want to prevent.

Really? It isn't generally important at all whether that escaping is applied; the XML infoset for either is the same.

I am frankly a bit surprised that the document loads at all.

> is a perfectly valid character to include in an attribute value. The only place > may need to be &-escaped in XML is in a ]]> sequence in text content, due to an obscure and silly rule in the spec.

To avoid having to think about the problem, many XML serialisers habitually escape > anywhere in text content or attribute values.

The Canonical XML specification specifies one particular way of serialising an XML document so the output can be compared as a simple string; for example it states exactly how attributes should be ordered. Canonical XML endorses >-escaping in text content, but it denies it in attribute values. So if you used a Canonical XML serialiser to output your document you'd get the result you expected for that particular value. (I can't guarantee it'd look how you want for other examples though.)

You can get a canonicaliser in .NET using XmlDsigC14NTransform (or maybe XmlDsigC14NWithCommentsTransform), something like:

XmlDsigC14NTransform transform= new XmlDsigC14NTransform(false);
transform.LoadInput(doc);
Stream stream= (Stream) t.GetOutput(typeof(Stream));
// write stream to file
bobince