Ignoring "Content is not allowed in trailing section" SAXException

Since your sender is presenting you with invalid XML, it needs to be corrected before it hits the parser if you want to avoid this exception. If you can't correct the sender, you'll need a preprocessing step of some sort.

If the situation is simply that you've got extra null bytes after the closing tag as indeicated by one of your responses to another answer, this might be something you can accomplish easily by wrapping your input stream in a FilterInputStream that you implement to skip null bytes.

If the problem is more complex than just null characters, you'll of course need a more complex filter, which might be difficult.

If you're using a ContentHandler, you can add a callback to it so that it can inform the calling code when the ending root tag has been handled, and based on that knowledge, the calling code can have logic in its handler for the exception to simply ignore it if the end has been signalled. At that point anything that had to be done by the parser has likely been done anyway! But this solution doesn't seem to apply for your situation.

I have no control over the sender. And your "answer" is not in the spirit of "Be liberal in what you accept and strict in what you emit."

Paul J. Lucas 2010-05-11 23:31:15

You asked "is there an easy way to parse an entire XML document in Java and have it ignore any trailing junk?" The answer is "no, there is not", and I gave the reason. Maybe you're looking for http://home.ccil.org/~cowan/XML/tagsoup/ ? Maybe you know that your XML doesn't have CDATA and you can implement a primitive inputStream wrapper? I'm not sure what answer you're looking for.

bkail 2010-05-12 00:01:07

Every XML parser keeps track of the every element and knows when said element has been "closed" by parsing the > of its closing tag. That means that every XML parser also knows the final > when it sees it because the first element has been balanced by its closing tag. At that point, I want the parser simply to stop. You're making this more complicated than it is.

Paul J. Lucas 2010-05-12 00:24:06

I'm not trying to make this complicated. I understand that what you want is conceptually simple, but it doesn't exist. Your only options are to either: use a non-compliant (or non-XML) parser, modify an existing XML parser to do what you want, or preprocess the input.

bkail 2010-05-12 01:05:53

Hopefully the downvote can be removed now that someone else has given the same answer.

bkail 2010-05-12 14:36:48

They may have given the same "base answer," but at least they offered ways to actually solve the problem whereas your original answer did not other than the terse and unhelpful "fix the sender."

Paul J. Lucas 2010-05-12 15:28:09

The other answer suggests you either: (1) preprocess the input, or (2) catch exceptions. You explicitly stated that #2 was not an option. You dismissed #1 when I suggested it in a comment, so I didn't bother to update my answer. Oh well.

bkail 2010-05-12 19:44:06

ansaurus

tags:

views:

answers:

Ignoring "Content is not allowed in trailing section" SAXException

related questions