tags:

views:

43

answers:

1

I need to find out how to detect if a node contains significant information.

The following example shows what is not considered "significant" information by me:

<node>
    <node1>&nbsp;</node1>
    </br></br>
    &nbsp;

    <node1>
        &nbsp;
        <node2></br>&nbsp;</node2>
        </br></br>
    </node1>
    <!--
    and so on...
    -->
</node>

This <node> is "empty" for me.

+1  A: 

Here is how to do it:

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "text()
      [translate(normalize-space(), '&#160;','')
      = ''
      ]"/>
</xsl:stylesheet>

When this transformation is applied to the following XML document (the one you provided was severely malformed -- non-well formed in numerous ways!!):

<!DOCTYPE node [ <!ENTITY nbsp "&#160;"> ]>
<node>
    <node1>&nbsp;</node1>
    <br></br>
    &nbsp;

    <node1>
        &nbsp;
        <node2><br/>&nbsp;</node2>
        <br></br>
    </node1>
    <!--
    and so on...
    -->
</node>

then the wanted result is produced:

<node>
   <node1/>
   <br/>
   <node1>
      <node2>
         <br/>
      </node2>
      <br/>
   </node1><!--
    and so on...
    -->
</node>

This technique can be generalized:

You can have all whetespace-characters in an xsl:variable, then simply override the identity rule with this template:

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:variable name="vwhiteSpace" select="' &#x9;&#xA;&#xD;&nbsp;'"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="text()">
   <xsl:if test="translate(., $vwhiteSpace,'') != ''">
     <xsl:copy-of select="."/>
   </xsl:if>
 </xsl:template>
</xsl:stylesheet>

And you can specify all additional characters you consider "white-space" in $vwhiteSpace

Update: The OP indicated in a comment that he actually wants to see if a "node" is significant or not -- not to "clean a node".

The solution to this is already contained in my solution to the initial problem:

  <xsl:variable name="vIsSignificant" select=
     "translate(., $vwhiteSpace,'') != ''"/>
Dimitre Novatchev
@Dimitre Novatchev, I have badly explained a problem. It is necessary to answer "yes" or there is no"on a question" this node contains the significant information "? Your decision "cleans" knot. I want check the importance of the node, instead of cleaning is necessary to me.
Kalinin
@Dimitre Novatchev, Thanks big. The code works.
Kalinin
@kalininew, My code *always* works. I never post incomplete or untested code.
Dimitre Novatchev
@Dimitre Novatchev, Yes, it is the truth. Thanks you for it.
Kalinin