tags:

views:

241

answers:

5

I have a large xml file which contents a lot of self-closed tags. How could remove all them by using XSLT.

eg.

<?xml version="1.0" encoding="utf-8" ?>
<Persons>
  <Person>
    <Name>user1</Name>
    <Tel />
    <Mobile>123</Mobile>
  </Person>
  <Person>
    <Name>user2</Name>
    <Tel>456</Tel>
    <Mobile />
  </Person>
  <Person>
    <Name />
    <Tel>123</Tel>
    <Mobile />
  </Person>
  <Person>
    <Name>user4</Name>
    <Tel />
    <Mobile />
  </Person>
</Persons>

I'm expecting the result:

<?xml version="1.0" encoding="utf-8" ?>
<Persons>
  <Person>
    <Name>user1</Name>
    <Mobile>123</Mobile>
  </Person>
  <Person>
    <Name>user2</Name>
    <Tel>456</Tel>
    </Person>
  <Person>
    <Tel>123</Tel>
  </Person>
  <Person>
    <Name>user4</Name>
  </Person>
</Persons>

Note: there are thousands of different elements, how can I programmatically remove all the self-closed tags. Another question is how to remove the empty element such as <name></name> as well.

Can anyone help me on this? Many thanks.

A: 

You might want to check if they are required. It should look something like this if they are: use="required". Also check if they are: type="nonEmptyString".

Icono123
Also, make sure the minOccurs="0" is 0.
Icono123
minOccurs, use etc are part of schema, not XSLT.. what you are trying to suggest??
infant programmer
A: 

You can remove all empty elements - ones that do not have nested elements and attributes declared. If this solution works for you you can do following:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:template match="*">
    <xsl:if test="string(.) != '' or descendant-or-self::*/@*[string(.)]">
      <xsl:element name="{name()}" >
        <xsl:copy-of select="@*[string(.)]"/>
        <xsl:apply-templates select="* | text()" />
      </xsl:element>
    </xsl:if>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:value-of select="."/>
  </xsl:template>

</xsl:stylesheet>
Andrew Bezzub
+6  A: 

The self-closed tags are equivalent to empty tags. You can remove all empty tags, but you have no way of knowing whether they were self-closed in the input XML or not (<tag/> and <tag></tag> are indistinguishable).

<!-- the identity template copies everything that has no special handler -->
<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*" />
  </xsl:copy>
</xsl:template>

<!-- special handler for elements that have no child nodes:
     they are removed by this empty template -->
<xsl:template match="*[not(node())]" />

If elements that contain whitespace only are "empty" by your definition as well, then replace the second template with:

<xsl:template match="*[normalize-space() = '']" />
Tomalak
Yeah, down-voted without an explanation. Nice move.
Tomalak
Acceptable answer. +1 from my side.
infant programmer
+1  A: 

From the XML point of view, there is no difference between "self-closed" element like and empty element like (see spec).

Here is a transformation to strip all empty elements:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output method="xml" indent="yes" encoding="utf-8" />
    <xsl:strip-space elements="*" />

    <xsl:template match="@*|node()">
        <xsl:if test=".!=''">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>
VladV
A: 
infant programmer