tags:

views:

47

answers:

3

Let's say that I have the following XML:

<someRootElement>
  <someTagWithUrl>http://www.google.com/s.php&amp;test=testing&lt;/someTagWithUrl&gt;
</someRootElement>

The ampersand inside the someTagWithUrl is invalid and needs to be escaped (by using &amp;), but suppose I have a single string with the entire above contents.

How can I safely escape the ampersand so it becomes valid XML? Can .NET's XML library ignore this? (currently XElement.Parse will throw an exception)

I've thought about using a regular expression to search for ampersands between tags but I can't quite get the syntax correct. (something like >(\&)\< as a regex replace using & but I can't figure it out).

+2  A: 

What you've pasted is invalid XML and any attempt to parse it with the XML libraries will fail. The best way to ensure that it is properly escaped is to use XML/HTML writers to create the data. For example the XmlWriter. They will ensure all strings are properly escaped.

JaredPar
Unfortunately I'm parsing already written XML so if I want to use it I need to solve this problem. I know it's invalid XML but I still need to use it...
SofaKng
JaredPar
Luckily I don't have to deal with CData so I guess it shouldn't be that bad?
SofaKng
+1  A: 
Aaron D
-1: and how will that help?
John Saunders
SofaKng
+1  A: 

Try this for your regular expression:

&(?!quot;|apos;|amp;|lt;|gt;#x?.*?;)

This will find only the invalid standalone & characters in your text (those that aren't part of an entity). Here's a sample of how you'd do text replacements before processing the source data as XML:

var regex = new Regex("&(?!quot;|apos;|amp;|lt;|gt;#x?.*?;)");
string fixedXml = regex.Replace(input, "&amp;");
Jacob
Thanks!! Will that only find ampersands between tags? (eg. a relatively safe replacement?)
SofaKng
It shouldn't matter. Ampersands are always invalid when they're not describing entities, and they're not allowed inside of tags either, either as element names, attribute names, or attribute values. The only problem you could encounter is if `CDATA` sections are in your XML; if so, the solution will be much more complex.
Jacob