views:

302

answers:

4

Hi, So I got this situation which sucks. I have an XML like this



<table border="1" cols="200 100pt 200">
  <tr>
    <td>isbn</td>
    <td>title</td>
    <td>price</td>
  </tr>
  <tr>
    <td />
    <td />
    <td>
      <span type="champsimple" id="9b297fb5-d12b-46b1-8899-487a2df0104e" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">
        [prénom]
      </span>
      <span type="champsimple" id="e103a6a5-d1be-4c34-8a54-d234179fb4ea" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">[nom]</span>
      <span></span>
    </td>
  </tr>
  <tr></tr>
  <tr>
    <td></td>
    <td>Phill It in</td>
  </tr>
  <tr>
    <table id="cas1">
      <tr>
        <td ></td>
        <td >foo</td>
      </tr>
      <tr>
        <td >bar</td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas2">
      <tr>
        <td ></td>
        <td >foo</td>
      </tr>
      <tr>
        <td ></td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas3">
      <tr>
        <td >bar</td>
        <td ></td>
      </tr>
      <tr>
        <td >foo</td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas4">
      <tr>
        <td />
        <td />
      </tr>
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <table id="cas4">
    <tr>
      <td />
      <td />
    </tr>
    <tr>
      <td>foo</td>
      <td>boo</td>
    </tr>
  </table>
  <tr>
    <td />
    <td />
  </tr>
</table>


Now the question is how would I recursively delete all empty td, tr and table elements?

Now I use this XSLT



<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform&quot;&gt;
  <xsl:output omit-xml-declaration="yes" indent="yes"/>
  <xsl:strip-space elements="*" />

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="td[not(node())]" />
  <xsl:template match="tr[not(node())]" />
  <xsl:template match="table[not(node())]" />

</xsl:stylesheet>


But it doesn't do very well. After I delete td, a tr becomes empty but it doesn't handle that. Too bad. See the table element with "cas4".



<table border="1" cols="200 100pt 200">
  <tr>
    <td>isbn</td>
    <td>title</td>
    <td>price</td>
  </tr>
  <tr>
    <td>
      <span type="champsimple" id="9b297fb5-d12b-46b1-8899-487a2df0104e" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">
        [prénom]
      </span>
      <span type="champsimple" id="e103a6a5-d1be-4c34-8a54-d234179fb4ea" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">[nom]</span>
      <span />
    </td>
  </tr>
  <tr>
    <td>Phill It in</td>
  </tr>
  <tr>
    <table id="cas1">
      <tr>
        <td>foo</td>
      </tr>
      <tr>
        <td>bar</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas2">
      <tr>
        <td>foo</td>
      </tr>
      <tr>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas3">
      <tr>
        <td>bar</td>
      </tr>
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas4">
      <tr />
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <table id="cas4">
    <tr />
    <tr>
      <td>foo</td>
      <td>boo</td>
    </tr>
  </table>
  <tr />
</table>


How would you solve this problem?

+1  A: 

You could also filter out any table that only contains <tr> with empty <td>, and any <tr> with only empty <tr> (in addition to your other filters), using something like this (not tested):

<xsl:template match="tr[not(td/node())]" /> 
<xsl:template match="table[not(tr/td/node())]" /> 
ckarras
A: 

You can change your "tr" and "table" matches to this:

<xsl:template match="tr[not(node() and (td[node()] or table[node()]))]"/>
<xsl:template match="table[not(node() and (tr[node() and (td[node()] or table[node()])]))]"/>

This will account for more variances like: tr with empty table(s) or td(s); table with empty tr(s), empty tables, etc.

There will be problems if you start getting into deeper nesting of tables. Since table's contain tr's which contain table's which contain tr's and so on.

For instance, this:

<table>
   <table>
      <tr>   
         <table>
            <tr/>
         </table>
      </tr>
   </table>
</table>

Becomes this with the matches above:

<table>
   <table>
      <tr/>
   </table>
</table>

I don't think you can do actual recursion in XSL 1.0. It could probably be done in 2.0 but I've never tried it.

Hope this helps.

DevNull
+2  A: 

It sounds like your definition of empty is "contains no text or only whitespace". Is this the case? If so, the following transformation should do the trick:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt; 
  <xsl:output omit-xml-declaration="yes" indent="yes"/> 
  <xsl:strip-space elements="*" /> 

  <xsl:template match="node()|@*"> 
    <xsl:copy> 
      <xsl:apply-templates select="node()|@*"/> 
    </xsl:copy> 
  </xsl:template> 

  <xsl:template match="td[not(normalize-space(.))]" /> 
  <xsl:template match="tr[not(normalize-space(.))]" /> 
  <xsl:template match="table[not(normalize-space(.))]" /> 
</xsl:stylesheet> 
markusk
+1 that's clean and pragmatic. Note that `normalize-space(.)` is equivalent to `normalize-space()`
Tomalak
+1  A: 

There is your solution:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="@* | text()">
        <xsl:copy />
    </xsl:template>

    <xsl:template match="table | tr | td">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- if there are any children left then copy myself -->
        <xsl:if test="count($content/node()) > 0">
            <xsl:copy>
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

</xsl:stylesheet>

The idea is simple. I will do the transformation for my descendants first and then I will look if there is anyone left. If so I will copy myself and the result of the transformation.

If you want to preserve the table structure and remove only empty rows - elements <tr> that contains only empty elements <td>, than just create similar template for <tr> with different condition and ignore elements <td>.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="@* | text()">
        <xsl:copy />
    </xsl:template>

    <xsl:template match="table">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- if there are any children left then copy myself -->
        <xsl:if test="count($content/node()) > 0">
            <xsl:copy>
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

    <xsl:template match="tr">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- number of non-empty td elements -->
        <xsl:variable name="cellCount">
            <xsl:value-of select="count($content/td[node()])" />
        </xsl:variable>

        <!-- number of other elements -->
        <xsl:variable name="elementCount">
            <xsl:value-of select="count($content/node()[name() != 'td'])" />
        </xsl:variable>

        <xsl:if test="$cellCount > 0 or $elementCount > 0">
            <xsl:copy>                  
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

</xsl:stylesheet>

Well, actually the last if should be like this:

<xsl:choose>
    <!-- if there are cells then copy the content -->
    <xsl:when test="$cellCount > 0">
        <xsl:copy>
            <xsl:apply-templates select="@*" />
            <xsl:copy-of select="$content" />
        </xsl:copy>
    </xsl:when>

    <!-- if there are only other elements copy them -->
    <xsl:when test="$elementCount > 0">
        <xsl:copy>
            <xsl:apply-templates select="@*" />
            <xsl:copy-of select="$content/node()[name() != 'td']" />
        </xsl:copy>
    </xsl:when>
</xsl:choose>

That is because of the situation when <tr> contains empty elements <td> and another elements. Then you want to delete the <td>s and leave only the rest.

CodeRipper
Thanks, good enough, the only thing to add is to use msxls:node-set($content) when you verify the number of nodes.
Monomachus
I tested it in XMLSpy and there count($content/node()) works just fine.
CodeRipper