tags:

views:

1591

answers:

4

I am trying to use DOM Parser in Java to parse a small XML file I pull off the net from its uri but receive an error complaining about missing semicolon.

Here's link 108:

Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("url_to_the_xml_file.xml");

Here's the error:

[Fatal Error] A01.xml:6:53: The character reference must end with the ';' delimiter.
Exception in thread "main" org.xml.sax.SAXParseException: The character reference must end with the ';' delimiter.
  at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
  at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
  at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
  at Parser.Parse(Parser.java:108)
  at Parser.main(Parser.java:185)

parsing this line of XML

<title>Reduction Algorithm using the &#192 TROUS Wavelet Transform.</title>

Obviously there's a semi-colon missing. Does anyone know any nice and tidy work arounds for this problem?

+1  A: 

I would retrieve the XML separately into a byte array/string and perform a regex replace on the malformed entity before I send it to the parser.

I'm not a regex expert but searching for a &#\d{1,4}[^;] might do one half of the trick.

If you have only this entity malformed you could just String.replaceAll("&#192", "&#192;");

kd304
with java its [0-9] instead of \d
Duncan
A: 

Am getting the same exception .

Am using forward tag in the configuration file and am passing the query string ?default=sample&page=homePage

Any one help me out?

A: 

If you have more problems with the XML syntax than that, a more comprehensive solution is to use HTMLTidy or its Java port, JTidy, to clean up the markup before you feed it to a parser. It was originally designed for HTML/XHTML, but I'm pretty sure it's capable of tidying arbitrary XML if given the right settings.

Walter Mundt
A: 

Use "?default=sample&[amp;]page=homePage"

Don't use square brackets ... I put to show what should be added

Prajakta