views:

597

answers:

3

We have a JAVA application that pulls the data from SAP, parses it and renders to the users. The data is pulled using JCO connector.

Recently we were thrown an exception:

org.xml.sax.SAXParseException: Character reference "&#00" is an invalid XML character.

So, we are planning to write a new level of indirection where ALL special/illegal characters are replaced BEFORE parsing the XML.

My questions here are :

  1. Is there any existing(open source) utility that does this job of replacing illegal characters in XML?
  2. Or if I had to write such utility, how should i handle them?
  3. Why is the above exception thrown?

Thank You.

A: 

I've had a related, but opposite problem, where I was trying to insert character 1 into the output of an XSLT transformation. I considered post-processing to replace a marker with the zero, but instead chose to use an xsl:param.

If I was in your situation, I'd either come up with a bespoke encoding, replacing the characters which are invalid in XML, and handling them as special cases in your parsing, or if possible, replace them with whitespace.

I don't have experience with JCO, so can't advise on how or where I'd replace the invalid characters.

Stephen Denne
A: 

Hi,

From my point of view, the source (SAP) should do the replacement. Otherwise, what it transmits to your programm may looks like XML, but is not.

While replacing the '&' by '&' can be done by a simple String.replaceAll(...) to the string from to toXML() call, others characters can be harder to replace (the '<' and '>' for exemple).

regards Guillaume

PATRY
+1  A: 
Tom