views:

1511

answers:

3

I have some Java (5.0) code that constructs a DOM from various (cached) data sources, then removes certain element nodes that are not required, then serializes the result into an XML string using:

// Serialize DOM back into a string
Writer out = new StringWriter();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "no");
tf.transform(new DOMSource(doc), new StreamResult(out));
return out.toString();

However, since I'm removing several element nodes, I end up with a lot of extra whitespace in the final serialized document.

Is there a simple way to remove/collapse the extraneous whitespace from the DOM before (or while) it's serialized into a String?

+1  A: 

Try using the following XSL and the strip-spave element to serialize your DOM

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
     <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

http://helpdesk.objects.com.au/java/how-do-i-remove-whitespace-from-an-xml-document

objects
Thanks! That's a good answer and I tried it.. and it works.
Marc Novakowski
+3  A: 

You can find empty text nodes using XPath, then remove them programmatically like so:

XPathFactory xpathFactory = XPathFactory.newInstance();
// XPath to find empty text nodes.
XPathExpression xpathExp = xpathFactory.newXPath().compile(
     "//text()[normalize-space(.) = '']");  
NodeList emptyTextNodes = (NodeList) 
        xpathExp.evaluate(doc, XPathConstants.NODESET);

// Remove each empty text node from document.
for (int i = 0; i < emptyTextNodes.getLength(); i++) {
    Node emptyTextNode = emptyTextNodes.item(i);
    emptyTextNode.getParentNode().removeChild(emptyTextNode);
}

This approach might be useful if you want more control over node removal than is easily achieved with an XSL template.

James Murty
I like this "code only" solution even better than the XSL solution, and like you said there is a bit more control over node removal, if required.
Marc Novakowski
By the way, this method only seems to work if I first call doc.normalize() before doing the node removal. I'm not sure why that makes a difference.
Marc Novakowski
A: 

Hi!

The above solution works but I am losing indentation completely. Is there a way I can keep indentation and remove the blank lines only

The above solution works but I am losing indentation completely. Is there a way I can keep indentation and remove the blank lines only