tags:

views:

3903

answers:

3

I am performing a find and replace on the line feed character (
) and replacing it with the paragraph close and paragraph open tags using the following code:

<xsl:template match="/STORIES/STORY">   
    <component>
     <xsl:if test="boolean(ARTICLEBODY)">
      <p>
       <xsl:call-template name="replace-text">
         <xsl:with-param name="text" select="ARTICLEBODY"  />
         <xsl:with-param name="replace" select="'&#10;'" />
         <xsl:with-param name="by" select="'&lt;/p&gt;&lt;p&gt;'" />
       </xsl:call-template>
      </p>
     </xsl:if>
    </component>
</xsl:template>

<xsl:template name="replace-text">
   <xsl:param name="text"/>
   <xsl:param name="replace" />
   <xsl:param name="by"  />

   <xsl:choose>
   <xsl:when test="contains($text, $replace)">
      <xsl:value-of select="substring-before($text, $replace)"/>
      <xsl:value-of select="$by" disable-output-escaping="yes"/>
      <xsl:call-template name="replace-text">
         <xsl:with-param name="text" select="substring-after($text, $replace)"/>
         <xsl:with-param name="replace" select="$replace" />
         <xsl:with-param name="by" select="$by" />
      </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
      <xsl:value-of select="$text"/>
   </xsl:otherwise>
   </xsl:choose>
</xsl:template>

This almost works perfectly, except that I really need it to de-dup the line feeds as the paragraphs tend to be separated by 2 or more resulting in </p><p></p><p>.

Is it possible to get it so that it will only ever replace this once per paragraph?

A: 

Given the XPath functions that you're calling which I don't remember having the luxury of in my MSXSL work, it looks like you're using an XPath 2-compatible processor.

If that's the case, doesn't XPath 2 have a replace(string, pattern, replacement) function that takes a regex as a second parameter?

<xsl:value-of 
    select="replace(string(.), '&#10;(\s|&#10;)*', '&lt;/p&gt;&lt;p&gt;')" />

It might help to have some sample Xml input and to know what processor you plan to use.

From your original example, it seems that the duplicate paragraphs all have a white-space only prefix. So something like this slight modification might trim the dupes.

<xsl:when test="contains($text, $replace)">
  <xsl:variable name="prefix" select="substring-before($text, $replace)" />
  <xsl:choose>
    <xsl:when test="normalize-string($prefix)!=''">
      <xsl:value-of select="$prefix"/>
      <xsl:value-of select="$by" disable-output-escaping="yes"/>
    </xsl:when>
  </xsl:choose>
  <xsl:call-template name="replace-text">
     <xsl:with-param name="text" select="substring-after($text, $replace)"/>
     <xsl:with-param name="replace" select="$replace" />
     <xsl:with-param name="by" select="$by" />
  </xsl:call-template>

Mike Haboustak
A: 

Try this (XSLT 2.0):

    <xsl:template match="/STORIES/STORY">
        <component>
            <xsl:if test="boolean(ARTICLEBODY)">
                <xsl:call-template name="insert_paras">
                    <xsl:with-param name="text" select="ARTICLEBODY/text()"/>
                </xsl:call-template>
            </xsl:if>
        </component>
    </xsl:template>

    <xsl:template name="insert_paras">
        <xsl:param name="text" />

        <xsl:variable name="regex">
            <xsl:text>&#10;(&#10;|\s)*</xsl:text>
        </xsl:variable>
        <xsl:variable name="tokenized-text" select="tokenize($text, $regex)"/>

        <xsl:for-each select="$tokenized-text">
            <p>
                <xsl:value-of select="."/>
            </p>
        </xsl:for-each>
    </xsl:template>

It's generally a bad idea to use literal strings to put in XML markup, since you can't guarantee that the results are balanced.

James Sulak
+3  A: 

disable-output-escaping isn't evil in itself, but there are only few cases where you should use it and this isn't one of them. In XSLT you work with trees, not markup string. Here's an XSTL 1.0 solution:

<xsl:template match="/STORIES/STORY">
  <component>
    <xsl:if test="ARTICLEBODY">
      <xsl:call-template name="wrap-text">
        <xsl:with-param name="text" select="ARTICLEBODY"/>
        <xsl:with-param name="delimiter" select="'&#10;'"/>
        <xsl:with-param name="element" select="'p'"/>
      </xsl:call-template>
    </xsl:if>
  </component>
</xsl:template>

<xsl:template name="wrap-text">
  <xsl:param name="text"/>
  <xsl:param name="delimiter"/>
  <xsl:param name="element"/>

  <xsl:choose>
    <xsl:when test="contains($text, $delimiter)">
      <xsl:variable name="t" select="substring-before($text, $delimiter)"/>
      <xsl:if test="normalize-space($t)">
        <xsl:element name="{$element}">
        <xsl:value-of select="$t"/>  
      </xsl:element>
      </xsl:if>        
      <xsl:call-template name="wrap-text">
        <xsl:with-param name="text" select="substring-after($text, $delimiter)"/>
        <xsl:with-param name="delimiter" select="$delimiter"/>
        <xsl:with-param name="element" select="$element"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:if test="normalize-space($text)">
        <xsl:element name="{$element}">
          <xsl:value-of select="$text"/>  
        </xsl:element>
      </xsl:if>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
jelovirt