I am getting this error when parsing an incorrectly-generated XML document:
org.xml.sax.SAXParseException: The value of attribute "bar" associated with an element type "foo" must not contain the '<' character.
I know what is causing the problem. It is this line:
<foo bar="x<y">42</foo>
It should have been
<foo bar="x<y">42</foo>
I am aware that this is not valid XML, but my code has to download and parse similar files unattended and for political reasons it might not be possible to persuade the supplier to fix the faulty program, especially when other programs are reading the file and tolerating this error.
Is there any way to configure Xerces to tolerate it? At present it treats it as a fatal error. Implementing an ErrorHandler
to ignore it is not satisfactory because then the remainder of the document is not parsed.
Alternatively can you suggest another stream-based parser that can be configured to tolerate this error? Using a DOM parser is not feasible as these documents run into hundreds of megabytes.