ansaurus

Question

How do I replace HTML escapes in an input stream before parsing it to XML?

Answer 1

A:

An XML parser should cope with character entities such as "&" ... assuming that's what you are talking about.

One possibility is that your input contains particular named entities that the XML parser doesn't know about.

Stephen C 2010-09-17 07:24:43

Answer 2

A:

Cant get where the problem occurs. My guess use the normalize() method as below.

Try this:

 strVarietalTitle = ((Node) varietalTitleTextNodes.item(0)).getNodeValue().normalize();

From documentation Normalize():

Puts Puts all Text nodes in the full depth of the sub-tree underneath this Node, including attribute nodes, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes. This can be used to ensure that the DOM view of a document is the same as if it were saved and re-loaded, and is useful when operations (such as XPointer [XPointer] lookups) that depend on a particular document tree structure are to be used. If the parameter "normalize-characters" of the DOMConfiguration object attached to the Node.ownerDocument is true, this method will also fully normalize the characters of the Text nodes. Note: In cases where the document contains CDATASections, the normalize operation alone may not be sufficient, since XPointers do not differentiate between Text nodes and CDATASection nodes.

Praveen Chandrasekaran 2010-09-17 07:27:37

Poolczar 2010-09-17 15:08:00

Thanks, normalized fixed the problem. try { db = dbf.newDocumentBuilder(); doc = db.parse(in); doc.getDocumentElement().normalize();

Poolczar 2010-09-17 15:30:52

ansaurus

tags:

views:

answers:

How do I replace HTML escapes in an input stream before parsing it to XML?

related questions