tags:

views:

3035

answers:

4

Hi there! I am parsing a XML file in Java using the W3C DOM. I am stuck at a specific problem, I can't figure out how to get the whole inner XML of a node.

The node looks like that:

<td><b>this</b> is a <b>test</b></td>

What function do I have to use to get that:

"<b>this</b> is a <b>test</b>"

+2  A: 

You have to use the transform/xslt API using your <b> node as the node to be transformed and put the result into a new StreamResult(new StringWriter()); . See how-to-pretty-print-xml-from-java

Pierre
+1  A: 

er... you could also call toString() and just chop off the beginning and end tags, either manually or using regexps.

edit: toString() doesn't do what I expected. Pulling out the O'Reilly Java & XML book talks about the Load and Save module of Java DOM.

See in particular the LSSerializer which looks very promising. You could either call writeToString(node) and chop off the beginning and end tags, as I suggested, or try to use LSSerializerFilter to not print the top node tags (not sure if that would work; I admit I've never used LSSerializer before.)

Reading the O'Reilly book seems to indicate doing something like this:

 DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
 DOMImplementationLS lsImpl = 
   (DOMImplementationLS)registry.getDOMImplementation("LS");
 LSSerializer serializer = lsImpl.createLSSerializer();
 String nodeString = serializer.writeToString(node);
Jason S
No? .toString() of my td-Node would just result in "[b: null]"
nodh
Hmm, I guess I got that confused with Javascript + e4x. I meant call the function which just produces the output, then delete the beginning/end tags.
Jason S
A: 

node.getTextContent();

You ought to be using JDom of Dom4J to handle nodes, if for no other reasons, to handle whitespace correctly.

A: 

To remove unneccesary tags probably such code can be used:

DOMConfiguration config = serializer.getDomConfig(); config.setParameter("canonical-form", true);

But it will not always work, because "canonical-form=true" is optional

javapowered