views:

114

answers:

2

Hi, Sorry I'm a Java/XML newbie - and can't seem to figure this one out. It seems it's possible to convert a Document object to a string. However, I want to convert a Node object into a string. I am using org.ccil.cowan.tagsoup Parser for my purpose.

I'm retrieving the Node by something like...

 parser = new org.ccil.cowan.tagsoup.Parser() 

 parser.setFeature(namespaceaware, false)

 Transformer transformer = TransformerFactory.newInstance().newTransformer(); 
 DOMResult domResult = new DOMResult(); 

 transformer.transform(new SAXSource(parser, new InputSource(in)), domResult);
 Node n = domResult.getNode();      

 // I'm interested in the first child, so...
 Node myNode = n.getChildNodes().item(0);

 // convert myNode to string..
 // what to do here?

The answer may be obvious, but I can't seem to figure out from the core Java libraries how to achieve this. Any help is much appreciated!

+3  A: 

You can use a Transformer (error handling and optional factory configuration omitted for clarity):

Node node = ...;
StringWriter writer = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(node), new StreamResult(writer));
String xml = writer.toString();
// Use xml ...
Kevin
Thanks Kevin. I tried this, but I end up with "xmlns" attributes for each HTML tag. (Actually my Node is an HTML fragment), so I end up with stuff like.."<p xmlns='....'>.... </p>Any idea how to avoid this?
Raj
You can transform them out. See here:http://stackoverflow.com/questions/2095673/how-to-remove-the-namespaces-from-the-element
Kevin
This looks promising! I'll try this and update here.
Raj
Well, I tried this and am stuck again. I'm not sure where to call 'setNamespaceAware(false)' in the above code snippet.From this link: http://www.mail-archive.com/[email protected]/msg05987.html - it seems that this is not a straightforward thing either. Clues?
Raj
It is on the DocumentBuilderFactory. See http://java.sun.com/javase/6/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setNamespaceAware%28boolean%29
Kevin
Kevin, I might've not been clear, but I'm using the TagSoup parser. So I'm not using DocumentBuilderFactory at all (again, I'm a java/xml newbie). This is what I'm doing:Transformer transformer = TransformerFactory.newInstance().newTransformer(); DOMResult domResult = new DOMResult(); transformer.transform(new SAXSource(parser, new InputSource(in)), domResult); Node node = domResult.getNode();// continues in original post. Where does DBF come into this?parser = new org.ccil.cowan.tagsoup.Parser()
Raj
I should also mention - I did try Parser.setFeature(Parser.isNamespaceAware, false).Now that I get the org.w3c.node, I pass it to another function like the one in the first reply - yet I get the xmlns stuff
Raj
Can you edit your original question to add the code you're using to parse the XML?
Kevin
Try using: parser.setFeature(Parser.namespacesFeature, false);
Kevin
Edited the question..
Raj
A: 

Try

String text = myNode.getNodeValue();

or possibly

String text = myNode.getTextContent();
Matthew Flynn