ansaurus

Question

Answer 1

+1 A:

The getTextContent is behaving as I would expect - getting the textural content of the HTML fragment. Can you check the API docs for the DOM parser and see if there's a similar method with a name like getHtmlContent?

Richard Ev 2009-12-23 14:52:11

I agree; you can treat the entire thing as String and using String.indexOf(..) method subString(..) everything in the body tag.

Samuh 2009-12-23 15:08:49

Answer 2

+1 A:

You would need to parse the document into a DOM and serialise only the portion of the DOM you wanted. Using the DOM Level 3 LS interfaces you can serialise the outer-XML of a single node with:

LSSerializer serializer= implementation.createLSSerializer();
String html= serializer.writeToString(node);

To get the inner-XML you would need to writeToString each child node in turn (eg. into a StringBuffer).

Depending on what DOM implementation you are using there may be alternative non-standard methods. There may also be risks with serialising HTML as XML, if that's what you're doing... eg. a standard XML serialiser may output a self-closing tag for an empty tag, which can confuse browsers parsing the output as legacy-HTML.

bobince 2009-12-23 15:05:50

ansaurus

tags:

views:

answers:

How can I get content of HTML <body>

related questions