views:

1447

answers:

2

How can I limit a string's word count in XSLT 1.0?

+2  A: 

How about something like:

  <xsl:template match="data"> <!-- your data element or whatever -->
    <xsl:call-template name="firstWords">
      <xsl:with-param name="value" select="."/>
      <xsl:with-param name="count" select="4"/>
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="firstWords">
    <xsl:param name="value"/>
    <xsl:param name="count"/>

    <xsl:if test="number($count) >= 1">
      <xsl:value-of select="concat(substring-before($value,' '),' ')"/>
    </xsl:if>
    <xsl:if test="number($count) > 1">
      <xsl:variable name="remaining" select="substring-after($value,' ')"/>
      <xsl:if test="string-length($remaining) > 0">
        <xsl:call-template name="firstWords">
          <xsl:with-param name="value" select="$remaining"/>
          <xsl:with-param name="count" select="number($count)-1"/>
        </xsl:call-template>
      </xsl:if>
    </xsl:if>
  </xsl:template>
Marc Gravell
+1  A: 

This is an XSLT 1.0 solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common"
>

   <xsl:import href="strSplit-to-Words.xsl"/>

   <xsl:output indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="/">
      <xsl:variable name="vwordNodes">
        <xsl:call-template name="str-split-to-words">
          <xsl:with-param name="pStr" select="/"/>
          <xsl:with-param name="pDelimiters" 
                          select="', &#9;&#10;&#13;()-'"/>
        </xsl:call-template>
      </xsl:variable>

      <xsl:call-template name="strTakeWords">
        <xsl:with-param name="pN" select="10"/>
        <xsl:with-param name="pText" select="/*"/>
        <xsl:with-param name="pWords"
             select="ext:node-set($vwordNodes)/*"/>
      </xsl:call-template>
    </xsl:template>

    <xsl:template match="word" priority="10">
      <xsl:value-of select="concat(position(), ' ', ., '&#10;')"/>
    </xsl:template>

    <xsl:template name="strTakeWords">
      <xsl:param name="pN" select="10"/>
      <xsl:param name="pText"/>
      <xsl:param name="pWords"/>
      <xsl:param name="pResult"/>

      <xsl:choose>
          <xsl:when test="not($pN > 0)">
            <xsl:value-of select="$pResult"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:variable name="vWord" select="$pWords[1]"/>
            <xsl:variable name="vprecDelims" select=
               "substring-before($pText,$pWords[1])"/>

            <xsl:variable name="vnewText" select=
                "concat($vprecDelims, $vWord)"/>

              <xsl:call-template name="strTakeWords">
                <xsl:with-param name="pN" select="$pN -1"/>
                <xsl:with-param name="pText" select=
                      "substring-after($pText, $vnewText)"/>
                <xsl:with-param name="pWords" select=
                     "$pWords[position() > 1]"/>
                <xsl:with-param name="pResult" select=
                 "concat($pResult, $vnewText)"/>
              </xsl:call-template>
          </xsl:otherwise>
      </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

when this transformation is applied on the following XML document:

<t>
(CNN) -- Behind closed doors in recent days,
senior White House aides have been saying that
measuring President Obama's first 100 days
is the journalistic equivalent of a Hallmark holiday.
</t>

the wanted result is returned:

(CNN) -- Behind closed doors in recent days, senior White House

Do note:

  1. The str-split-to-words template from FXSL is used for tokenization.

  2. This template accepts a parameter pDelimiters which is a string consisting of all characters that should be treated as delimiters. Thus, in contrast with other solutions, it is possible to specify every delimiter (and not just a "space") -- in this case 8 of them.

  3. The named template strTakeWords calls itself recursively to accumulate the text before and including every word from the wordlist produced by the tokenization, until the specified number of words has been processed.

Dimitre Novatchev
Hi thanks very much for your replies.Just to clarify I cannot link to extensions unfortunatley (we are on an internal network).Is there any way to do this without using extensions?Best Regards, Will
@Will My solution doesn't require any extension function except the EXSLt node-set(), which is implemented internally by most XSLT 1.0 processors (such as the .NET XslCompiledTransform class). Any FXSL 1.x template is written in pure XSLT and the only used extension function is the already mentioned EXSLT node-set() function. Therefore, there is absolutely no obstacle to use this solution inside an internal network -- just use any XSLT 1.0 processor, which implements the EXSLT node-set() function (such as .NET XslCompiledTransform, Saxon 6, Xalan, JD, ..., etc.)
Dimitre Novatchev