tags:

views:

1364

answers:

5

I'm looking for a simple Java snippet to remove empty tags from a (any) XML structure

<xml>
    <field1>bla</field1>
    <field2></field2>
    <field3/>
    <structure1>
       <field4>bla</field4>
       <field5></field5>
    <structure1>
</xml>

should turn into;

<xml>
    <field1>bla</field1>
    <structure1>
       <field4>bla</field4>
    </structure1>
</xml>
A: 

With XSLT you could transform your XML to ignore the empty tags and re-write the document.

Alex
+5  A: 

This XSLT stylesheet should do what you're looking for:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:template match="@*|node()">
    <xsl:if test=". != '' or ./@* != ''">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

It should also preserve elements which are empty but have attributes which aren't. If you don't want this behaviour then change:

<xsl:if test=". != '' or ./@* != ''">

To: <xsl:if test=". != ''">

If you want to know how to apply XSLT in Java, there should be plenty of tutorials out there on the Interwebs. Good luck!

Chris R
+1 for XSLT solution
Thorbjørn Ravn Andersen
+5  A: 

I was wondering whether it would be easy to do this with the XOM library and gave it a try.

It turned out to be quite easy:

import nu.xom.*;

import java.io.File;
import java.io.IOException;

public class RemoveEmptyTags {

    public static void main(String[] args) throws IOException, ParsingException {
        Document document = new Builder().build(new File("original.xml"));
        handleNode(document.getRootElement());
        System.out.println(document.toXML()); // empty elements now removed
    }

    private static void handleNode(Node node) {
        if (node.getChildCount() == 0 && "".equals(node.getValue())) {
            node.getParent().removeChild(node);
            return;
        }
        // recurse the children
        for (int i = 0; i < node.getChildCount(); i++) { 
            handleNode(node.getChild(i));
        }
    }
}

This probably won't handle all corner cases properly, like a completely empty document. And what to do about elements that are otherwise empty but have attributes?

(This answer is part of my evaluation of XOM as a potential replacement to dom4j.)

Jonik
Thanks, I'll use this
Raymond
+2  A: 

As a side note: The different states of a tag actually have meaning:

  • Open-Closed Tag: The element exists and its value is an empty string
  • Single-Tag: The element exists, but the value is null or nil
  • Missing Tag: The element does not exist

So, by removing empty Open-Closed tags and Single-Tags, you're merging them with the group of missing tags and thus lose information.

mhaller
Very good point - there are times when it is useful to remove tags whose value is empty or null, but there are also times when doing so could potentially be detrimental to the application.
Chris R
For my purpose, this is irrelevant
Raymond
A: 

If the xml is feed as a String; regex could be used to filter out empty elements:

<(\\w+)></\\1>|<\\w+/>

This will find empty elements.

data.replaceAll(re, "")

data in this case a variable holding your xml string.
Not saying this would be the best of solutions, but it is possible...

Kennet