For parsing an invalid XML file, having either unencoded, illegal characters (ampersands in my case):
<url>http://example.com?param1=bad&param2=ampersand</url>
and encoded ones
<description> The good, the bad & the ugly </description>
Please post an example with a sed/awk script that can encode the illegal characters.