ansaurus

Question

Answer 1

+1 A:

One approach might be to try dom4j, and to use the Node.asXML() method. It might return a deep structure, so it might need cloning to get just the node or text you want without any of its children.

John 2010-04-12 21:23:30

Answer 2

+2 A:

You can turn them back into xml-encoded form by

 StringEscapeUtils.escapeXml(str);

(javadoc, commons-lang)

Bozho 2010-04-12 21:23:41

Answer 3

A:

Both good answers, but both a little too heavy-weight for this very small-scale application. I ended up going with the total hack of just stripping out all &s (I do this to &s that aren't part of escapes later anyway). It's ugly, but it's working.

Edit: I understand there's all kinds of things wrong with this, and that the requirement is stupid. It's for a school project, all that matters is that it work in one case, and the requirement is not my fault :)

Personman 2010-04-12 21:43:54

It will stop working at one point and you will wonder where did it come from ;)

Bozho 2010-04-13 04:42:15

Answer 4

+2 A:

I'm using a DocumentBuilder to parse XML files. However, the specification for the project requires that within text nodes, strings like " and < be returned literally, and not decoded as characters (" and <).

Bad requirement. Don't do that.

Or at least consider carefully why you think you want or need it.

CDATA sections and escapes are a tactic for allowing you to pass text like quotes and '<' characters through XML and not have XML confuse them with markup. They have no meaning in themselves and when you pull them out of the XML, you should accept them as the quotes and '<' characters they were intended to represent.

Don Roby 2010-04-12 22:07:58

ansaurus

tags:

views:

answers:

Java: Ignoring escapes when parsing XML

related questions