tags:

views:

64

answers:

2

For parsing an invalid XML file, having either unencoded, illegal characters (ampersands in my case):

<url>http://example.com?param1=bad&amp;param2=ampersand&lt;/url&gt;

and encoded ones

<description> The good, the bad &amp; the ugly </description>

Please post an example with a sed/awk script that can encode the illegal characters.

+1  A: 

Completely untested, but you could cheat by converting all the valid ones back to their original form then doing the conversion back again.

For example, if you only had to worry about ampersands, you could do something similar to:

sed 's/&amp;/&/g' | sed 's/&/&amp;/g'

Of course, you can do it a lot cleaner and their will be better solutions, but some rest is calling me and I'm sure you can work it out from here.

Dan McGrath
glenn jackman
+1  A: 
tidy -m -xml <your-xml-file>
Tomalak