views:

1179

answers:

5

Hello,

I have a question for the clever people of the SO community.

Below is a snippet of XML generated by the Symphony CMS.

   <news>
        <entry>
         <title>Lorem Ipsum</title>
         <body>
          <p><strong>Lorem Ipsum</strong></p>
          <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
          <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna.</p>
         </body>
        </entry>
    </news>

What I need to do is take a portion of the <body> element, based on a specified length, for display in the blog style of:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem... more

...where more is a link to the full news item. I know I can select specific paragraphs and I also know I can use the substring function to bring a specified number of characters. However, I need to preserve the formatting of the text, i.e. the HTML tags within the <body> element.

I realise this raises issues of tag closure but there must surely be a way. Hopefully someone more experienced with XSLT can shed some light on this issue.

Thanks, Neil

A: 

This will be an episode in pain using XSLT. I would strongly recommend using a scripting language like Perl/Python to attempt this.

Rob Di Marco
You may be down-voted, but you're right. I can't believe how much XSLT is required to accomplish a seemingly simple text transformation problem.
Tyson
@Tyson Let's see your solution then -- in any language you like. Remember: the existing formatting must be left intact, and the length above which to truncate is cumulative. Have fun :)
Dimitre Novatchev
+2  A: 

What you are asking is an XSLT ellipsis generator.

May be this xslt 1.0 template might give you some idea:

Here is the main gist of it:

<xsl:template match="text()" mode="label">
    <xsl:param name="self-x"/>
    <xsl:param name="self-y"/>
    <xsl:variable name="text" select="normalize-space(.)"/>
    <!-- a quick and dirty way to avoid problems with line breaks -->
    <!-- replace the select attribute with this call
         if you want to use a fancier way to escape whitespace
         characters:
          <xsl:call-template name="escape-ws"
            <xsl:with-param name="text" select="." /
          </xsl:call-template
    -->
    <use xlink:href="#text-box" transform="translate({$self-x} 
 {$self-y})"/>
    <!-- text nodes are marked with a little box -->
    <text x="{$self-x + $writing-bump-over}"
          y="{$self-y - $writing-bump-up}"
          style="{$text-font-style}; stroke:none; fill:{$text-color}">
      <xsl:text>"</xsl:text>
      <xsl:value-of select="substring($text,1,$max-text-length)"/>
      <!-- truncate the text node to $max-text-length -->
      <xsl:if test="string-length($text) &gt; $max-text-length">
        <!-- add an ellipsis if necessary -->
        <xsl:text>...</xsl:text>
      </xsl:if>
      <xsl:text>"</xsl:text>
    </text>
  </xsl:template>

Note:

  • you will need to replace the ellipsis by a link, but the main idea is there.
  • this represents only a small extract of the all script
  • you may not need everything in it: if you need "<use xlink:href="...", you need to declare the xlink namespace
VonC
Thanks so much, I will give this a try and let you know how I get on.
Neil Albrock
What is the <use> element? What is the meaning of the two parameters? The $writing-bump-xxx variables are undefined! Even if your purpose was just to give an idea, this answer fails to achieve this for now.
Dimitre Novatchev
@Dimitre the purpose was to show only the part dealing with the max-length of the string a producing the ellipsis. The rest is defined in the xslt script in the html address I mention. Plus the use part may not be needed for what Neil is after.
VonC
@VonC Not a useful answer for me. I will provide something more meaningful.
Dimitre Novatchev
@Dimitre: Absolutely! By all means. Mine was just to get this thread started. If you have a better more precise answer, I will upvote it :)
VonC
I appreciate your help, hopefully through some combined effort we can sort this one out ;-)
Neil Albrock
OK. My solution is posted.
Dimitre Novatchev
+3  A: 

Here is a complete XSLT 1.0 transformation that solves exactly the problem.

This XSLT transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common"
 xmlns:f="http://fxsl.sf.net/"
 xmlns:myAdd="f:myAdd"
 xmlns:myParam="f:myParam"
 exclude-result-prefixes="ext f myAdd myParam"
>
 <xsl:import href="scanl.xsl"/>
 <!--                                         -->
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <!--                                         -->
 <myAdd:myAdd/>
 <myParam:myParam>0</myParam:myParam>
 <!--                                         -->
 <xsl:param name="pTruncateLength" select="772"/>
 <!--                                         -->
   <xsl:variable name="vFun" select="document('')/*/myAdd:*[1]"/>
   <xsl:variable name="vZero" select="document('')/*/myParam:*[1]"/>
 <!--                                         -->
   <xsl:variable name="vrtfScanResults">
           <xsl:call-template name="scanl">
             <xsl:with-param name="pFun" select="$vFun"/>
             <xsl:with-param name="pQ0" select="$vZero" />
             <xsl:with-param name="pList" select="/*/*/body//text()"/>
           </xsl:call-template>
   </xsl:variable>
 <!--                                         -->
   <xsl:variable name="vScanResults"
        select="ext:node-set($vrtfScanResults)"/>
   <xsl:variable name="vindNode" select=
    "count($vScanResults/*[. > $pTruncateLength][1]
                                   /preceding-sibling::*)"/>
 <!--                                         -->
   <xsl:variable name="vrtfTruncInfo">
       <xsl:for-each select="/*/*/body//text()">
 <!--                                         -->
         <xsl:variable name="vPos" select="position()"/>
         <tNode id="{generate-id()}">
           <xsl:attribute name="preserve">
             <xsl:if test="$vPos &lt; $vindNode">
               <xsl:value-of select="string-length(.)"/>
             </xsl:if>
             <xsl:if test="$vPos > $vindNode">
               <xsl:value-of select="0"/>
             </xsl:if>
             <xsl:if test="$vPos = $vindNode">
               <xsl:value-of select=
               "$vScanResults/*[$vindNode+1]
               -
                $pTruncateLength"/>
             </xsl:if>
           </xsl:attribute>
         </tNode>
       </xsl:for-each>
   </xsl:variable>
 <!--                                         -->
   <xsl:variable name="vTruncInfo" select="ext:node-set($vrtfTruncInfo)"/>
 <!--                                         -->
 <xsl:template match="node()|@*">
   <xsl:copy>
     <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>
 <!--                                         -->
 <xsl:template match="text()[ancestor::body]">
   <xsl:variable name="vAllowedLength"
        select="$vTruncInfo/*[@id = generate-id(current())]/@preserve"
   />
 <!--                                         -->
   <xsl:value-of select="substring(.,1,$vAllowedLength)"/>

   <xsl:if test="string-length(.) > $vAllowedLength
               and
                 $vAllowedLength > 0
                ">
     <strong> ...more</strong>
   </xsl:if>
 </xsl:template>
 <!--                                         -->
 <xsl:template match="myAdd:*" mode="f:FXSL">
   <xsl:param name="pArg1"/>
   <xsl:param name="pArg2"/>
   <xsl:value-of select="$pArg1 + string-length($pArg2)"/>
 </xsl:template>
</xsl:stylesheet>

when applied on the original source XML document:

<news>
    <entry>
     <title>Lorem Ipsum</title>
     <body>
      <p>
       <strong>Lorem Ipsum</strong>
      </p>
      <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
      <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna.</p>
      <p>This text should not be displayed</p>
     </body>
    </entry>
</news>

produces the wanted result:

<news>
   <entry>
      <title>Lorem Ipsum</title>
      <body>
         <p>
            <strong>Lorem Ipsum</strong>
         </p>
         <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
         <p>Lorem <strong> ...more</strong>
         </p>
         <p/>
      </body>
   </entry>
</news>

Do note the following:

  1. The scanl stylesheet from the FXSL library is imported. This template is commonly used to accumulate data from processing a list of items. The function (the template matching myAdd:*) that does the actual processing is passed as a parameter to the scanl template. The other parameter that must be passed to it is the "initial" value from processing, which is to be returned if the passed list of items is empty.

  2. The global parameter $pTruncateLength holds the maximum string length exceeding which the text must be truncated

Dimitre Novatchev
@Dimitre Thanks so much, I'm going to try this on my site. I'll let you know how it goes.
Neil Albrock
@Dimitre: impressive. +1. And feel free to downvote mine (very incomplete) answer.
VonC
@VonC Thanks, I will not downvote you here, feel free to improve :)
Dimitre Novatchev
@Neil Albrock You need to download and unzip FXSL 1.x from http://www.sf.net/projects/fxsl and adjust the xsl:import/@href to the exact path of the scanl.xsl file. In case you have problems, let me know and I'll provide the files.
Dimitre Novatchev
@Dimitre: I finally got around to trying your solution. When I applied your XSLT to my original XML, this was the result...<news> <entry> <title>Lorem Ipsum</title> <body> <p> <strong/> </p> <p/> <p/> <p/> </body> </entry></news>
Neil Albrock
..as you can see, there is something not quite right. Probably with my implementation ;-)
Neil Albrock
@Neil Albrock It would be good if you could send me the exact code. Use my userid -- with the gmail email.
Dimitre Novatchev
+1  A: 

Here's my version. I've tested it over your XML sample and it works.

To invoke it, use <xsl:apply-templates select="path/to/body/*" mode="truncate"/>.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

<xsl:strip-space elements="*"/>

<!-- limit: the truncation limit -->
<xsl:variable name="limit" select="250"/>

<!-- t: Total number of characters in the set -->
<xsl:variable name="t" select="string-length(normalize-space(//body))"/>

<xsl:template match="*" mode="truncate">
 <xsl:variable name="preceding-strings">
  <xsl:copy-of select="preceding::text()[ancestor::body]"/>
 </xsl:variable>

 <!-- p: number of characters up to the current node -->
 <xsl:variable name="p" select="string-length(normalize-space($preceding-strings))"/>

 <xsl:if test="$p &lt; $limit">
  <xsl:element name="{name()}">
   <xsl:apply-templates select="@*" mode="truncate"/>
   <xsl:apply-templates mode="truncate"/>
  </xsl:element>
 </xsl:if>
</xsl:template>

<xsl:template match="text()" mode="truncate">
 <xsl:variable name="preceding-strings">
  <xsl:copy-of select="preceding::text()[ancestor::body]"/>
 </xsl:variable>

 <!-- p: number of characters up to the current node -->
 <xsl:variable name="p" select="string-length(normalize-space($preceding-strings))"/>

 <!-- c: number of characters including current node -->
 <xsl:variable name="c" select="$p + string-length(.)"/>

 <xsl:choose>
  <xsl:when test="$limit &lt;= $c">
   <xsl:value-of select="substring(., 1, ($limit - $p))"/>
   <xsl:text>&#8230;</xsl:text>
  </xsl:when>
  <xsl:otherwise>
   <xsl:value-of select="."/>
  </xsl:otherwise>
 </xsl:choose>
</xsl:template>

<xsl:template match="@*" mode="truncate">
 <xsl:attribute name="{name(.)}"><xsl:value-of select="."/></xsl:attribute>
</xsl:template>

</xsl:stylesheet>
Chaotic Pattern
A: 

After much hacking, I came to this solution:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

<!--
    Author: Neil Albrock
    Version: 1.0
    Description: Truncate by a character limit and retain HTML content.
    Usage: 
        <xsl:call-template name="truncate">
            <xsl:with-param name="data" select="path/to/your/body" />
            <xsl:with-param name="length" select="250" />
      <xsl:with-param name="link" select="'href'" />
        </xsl:call-template>
-->

<xsl:template name="truncate">

    <!-- The node set to be worked on. -->
    <xsl:param name="data"/>
    <!-- The desired truncate length. Default to length of data. -->
    <xsl:param name="length" select="string-length($data)"/>
    <!-- More link -->
    <xsl:param name="link"/>

    <xsl:choose>
     <!-- Return whole data if it's within length. -->
     <xsl:when test="string-length($data) &lt;= $length">
      <xsl:copy-of select="$data" />
     </xsl:when>
     <!-- Truncate to desired length. -->
     <xsl:otherwise>
      <xsl:for-each select="$data/*">
       <xsl:variable name="this-node" select="string-length(.)"/>
       <xsl:variable name="preceding-nodes">
        <xsl:copy-of select="preceding-sibling::*"/>
       </xsl:variable>
       <xsl:variable name="node-sum" select="string-length(normalize-space($preceding-nodes))"/>
       <xsl:variable name="limit" select="$node-sum + $this-node"/>

       <xsl:choose>
        <xsl:when test="$limit &gt; $length and $node-sum &lt;= $length">
         <p>
         <xsl:value-of select="substring(.,1,$length - $node-sum)"/>
         <xsl:text>&#8230;</xsl:text>
         <a>
          <xsl:attribute name="href">
           <xsl:value-of select="$link"/>
          </xsl:attribute>
          <xsl:text>more</xsl:text>
         </a>
         </p>
        </xsl:when>
        <xsl:when test="$limit &lt; $length">
         <xsl:copy-of select="."/>
        </xsl:when>
        <xsl:otherwise/>
       </xsl:choose>

      </xsl:for-each>
     </xsl:otherwise>
    </xsl:choose>

</xsl:template>

</xsl:stylesheet>

I would use the solution by Chaotic Pattern though, it's more elegant ;-)

Neil Albrock