tags:

views:

36

answers:

2

How I can get first n characters with XSLT 1.0 from XHTML? I'm trying to create introduction text for news.

  • Everything is UTF-8
  • HTML entity aware (  &), one entity = one character
  • HTML tag aware (adds missing end tags)
  • Input HTML is always valid
  • If input text is over n chars add '...' to end output
  • Input tags are restricted to: a, img, p, div, span, b, strong

Example input HTML:

<img src="image.jpg" alt="">text <a href="http://domain.tld"&gt;link here</a>

Example output with 9 characters:

<img src="image.jpg" alt="">text <a href="http://domain.tld"&gt;link...&lt;/a&gt;

Example input HTML:

<p><a href="http://domain.tld"&gt;link here</a> text</p>

Example output with 4 characters:

<p><a href="http://domain.tld"&gt;link...&lt;/a&gt;&lt;/p&gt;
A: 

Here is a starting point, although it currently doesn't contain any code to handle the requirement "Input tags are restricted to: a, img, p, div, span, b, strong"

It works by looping through the child nodes of a node, and totalling the length of the preceding siblings up to that point. Note that the code to get the length of the preceding siblings requires the use of the node-set function, which is an extension function to XSLT 1.0. In my example I am using Microsoft Extension function.

Where a node is not a text node, the total length of characters up to that point will be the sum of the lengths of the preceding siblings, put the sum of the preceding siblings of the parent node (which is passed as a parameter to the template).

Here is the XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
   <xsl:param name="MAXCHARS">9</xsl:param>

   <xsl:template match="/body">
      <xsl:apply-templates select="child::node()"/>
   </xsl:template>

   <xsl:template match="node()">
      <xsl:param name="LengthToParent">0</xsl:param>

      <!-- Get length of previous siblings -->
      <xsl:variable name="previousSizes">
         <xsl:for-each select="preceding-sibling::node()">
            <length>
               <xsl:value-of select="string-length(.)"/>
            </length>
         </xsl:for-each>
      </xsl:variable>
      <xsl:variable name="LengthToNode" select="sum(msxsl:node-set($previousSizes)/length)"/>

      <!-- Total amount of characters processed so far -->
      <xsl:variable name="LengthSoFar" select="$LengthToNode + number($LengthToParent)"/>

      <!-- Check limit is not exceeded -->
      <xsl:if test="$LengthSoFar &lt; number($MAXCHARS)">
         <xsl:choose>
            <xsl:when test="self::text()">
               <!-- Output text nonde with ... if required -->
               <xsl:value-of select="substring(., 1, number($MAXCHARS) - $LengthSoFar)"/>
               <xsl:if test="string-length(.) &gt; number($MAXCHARS) - $LengthSoFar">...</xsl:if>
            </xsl:when>
            <xsl:otherwise>
               <!-- Output copy of node and recursively call template on its children -->
               <xsl:copy>
                  <xsl:copy-of select="@*"/>
                  <xsl:apply-templates select="child::node()">
                     <xsl:with-param name="LengthToParent" select="$LengthSoFar"/>
                  </xsl:apply-templates>
               </xsl:copy>
            </xsl:otherwise>
         </xsl:choose>
      </xsl:if>
   </xsl:template>

</xsl:stylesheet>

When applied to this input

<body> 
   <img src="image.jpg" alt="" />text <a href="http://domain.tld"&gt;link here</a>
</body>

The output is:

<body> 
   <img src="image.jpg" alt="" />text <a href="http://domain.tld"&gt;link...&lt;/a&gt;
</body>

When applied to this input (and changing the parameter to 4 in the XSLT)

<p><a href="http://domain.tld"&gt;link here</a> text</p>

The output is:

<p><a href="http://domain.tld"&gt;link...&lt;/a&gt;&lt;/p&gt;
Tim C
A: 

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:param name="pMaxLength" select="4"/>
    <xsl:template match="node()">
        <xsl:param name="pPrecedingLength" select="0"/>
        <xsl:variable name="vContent">
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:apply-templates select="node()[1]">
                    <xsl:with-param name="pPrecedingLength"
                                    select="$pPrecedingLength"/>
                </xsl:apply-templates>
            </xsl:copy>
        </xsl:variable>
        <xsl:variable name="vLength"
                      select="$pPrecedingLength + string-length($vContent)"/>
        <xsl:if test="$pMaxLength + 3 >= $vLength and
                      (string-length($vContent) or not(node()))">
            <xsl:copy-of select="$vContent"/>
            <xsl:apply-templates select="following-sibling::node()[1]">
                <xsl:with-param name="pPrecedingLength" select="$vLength"/>
            </xsl:apply-templates>
        </xsl:if>
    </xsl:template>
    <xsl:template match="text()" priority="1">
        <xsl:param name="pPrecedingLength" select="0"/>
        <xsl:variable name="vOutput"
                      select="substring(.,1,$pMaxLength - $pPrecedingLength)"/>
        <xsl:variable name="vSumLength"
                      select="$pPrecedingLength + string-length($vOutput)"/>
        <xsl:value-of select="concat($vOutput,
                                     substring('...',
                                               1 div ($pMaxLength
                                                            = $vSumLength)))"/>
        <xsl:apply-templates select="following-sibling::node()[1]">
            <xsl:with-param name="pPrecedingLength"
                            select="$vSumLength"/>
        </xsl:apply-templates>
    </xsl:template>
</xsl:stylesheet>

With this input and 9 as pMaxLength:

<html><img src="image.jpg" alt=""/>text <a href="http://domain.tld"&gt;link here</a></html>

Output:

<html><img src="image.jpg" alt="">text <a href="http://domain.tld"&gt;link...&lt;/a&gt;&lt;/html&gt;

And this input with 4 as pMaxLength:

<html><p><a href="http://domain.tld"&gt;link here</a> text</p></html>

Output:

<html><p><a href="http://domain.tld"&gt;link...&lt;/a&gt;&lt;/p&gt;&lt;/html&gt;
Alejandro