I'm using Java's DocumentBuilder.parse(InputStream)
to parse an XML document. Occasionally, I get malformed XML documents in that there is extra junk after the final >
that causes a SAXException: Content is not allowed in trailing section
. (In the cases I've seen, the junk is simply one or more null bytes.)
I don't care what's after the final >
. Is there an easy way to parse an entire XML document in Java and have it ignore any trailing junk?
Note that by "ignore" I don't simply mean to catch and ignore the exception: I mean to ignore the trailing junk, throw no exception, and to return the Document
object since the XML up to an including the final >
is valid.