tags:

views:

40

answers:

2

Hi everyone,

I seem to be having an issue with Xalan's translate method. I have the following code:

translate(translate(string(name),'<sup>',''),'</sup>','')

This is used to remove <sup> and </sup> from string(name). Unfortunately when I do that, it seems to remove s, u and p from the names as well. So names like sony Braiva <sup>tm</sup> become ony bravia tm

Thanks for you help in advance :)

+3  A: 

Because you said that the translate() function is successfully removing <sup> and </sup>, I am assuming that <sup> is not an element in the XML document, but is encoded as text.

The translate() function is defined to substitute individual characters and generally isn't suitable for string replacement when the string length is greater than 1.

It is possible to write and use a general string replacement recursive template/function in XSLT.

XSLT 2.0 programmers can use the standard XPath 2.0 function replace().

In your particular case even this this may be sufficient:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()">
  <xsl:variable name="vPart1" select=
   "substring-before(., '&lt;sup>')"/>

  <xsl:value-of select="$vPart1"/>

  <xsl:variable name="vPart2" select=
   "substring-before(substring-after(., '&lt;sup>'),
                     '&lt;/sup>'
                     )"/>

  <xsl:value-of select="$vPart2"/>

  <xsl:variable name="vPart3" select=
   "substring-after(., '&lt;/sup>')"/>

  <xsl:value-of select="$vPart3"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XML document:

<name>
 <![CDATA[sony Braiva <sup>tm</sup> xxx]]>
</name>

the wanted result is produced:

<name>
sony Braiva tm xxx
</name>

Alternatively, here is the full-blown recursive template solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()">
  <xsl:variable name="vFirstReplacement">
      <xsl:call-template name="replace">
       <xsl:with-param name="pText" select="."/>
       <xsl:with-param name="pPattern"
         select="'&lt;sup>'"/>
       <xsl:with-param name="pReplacement" select="''"/>
      </xsl:call-template>
  </xsl:variable>

  <xsl:call-template name="replace">
   <xsl:with-param name="pText"
        select="$vFirstReplacement"/>
   <xsl:with-param name="pPattern"
     select="'&lt;/sup>'"/>
   <xsl:with-param name="pReplacement" select="''"/>
  </xsl:call-template>
 </xsl:template>

 <xsl:template name="replace">
  <xsl:param name="pText"/>
  <xsl:param name="pPattern"/>
  <xsl:param name="pReplacement"/>

  <xsl:choose>
   <xsl:when test="not(contains($pText, $pPattern))">
    <xsl:value-of select="$pText"/>
   </xsl:when>
   <xsl:otherwise>
     <xsl:value-of select=
      "substring-before($pText, $pPattern)"/>

     <xsl:value-of select="$pReplacement"/>

     <xsl:call-template name="replace">
      <xsl:with-param name="pText" select=
       "substring-after($pText, $pPattern)"/>
      <xsl:with-param name="pPattern"
           select="$pPattern"/>
      <xsl:with-param name="pReplacement"
           select="$pReplacement"/>
     </xsl:call-template>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on this XML document:

<name>
 <![CDATA[sony Braiva <sup>tm</sup> xxx]]>
</name>

the wanted, correct result is produced:

<name>
 sony Braiva tm xxx
</name>

Finally, here is the XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()">
  <xsl:value-of select=
   "replace(
            replace(., '&lt;sup>', ''),
            '&lt;/sup>',
            ''
            )
   "/>
 </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
Thanks. This helps! Wow...seriously out of words. Thanks a lot.
Bilzac
+1  A: 

tl;dr version: Don't manipulate html or xml as strings if you can possibly avoid it. Do it in XSLT.

I'm assuming that what you have is some element contains something like

<name>Sony Braiva <sup>tm</sup></name>

So it looks like you've got a parsed XML document already in XSLT. Then, you're turning around and trying to use string manipulation to pull some tags out. That's a bad idea; see this question about matching tags. XSLT is exactly for this sort of manipulation, so use it! (If my assumption is wrong and that tm is entity-ized or in a CDATA section or whatever, that's different I guess.)

So, first. If you want to strip all tags out of name leaving just the text, you can do

<xsl:value-of select="name" />

which would give:

Sony Braiva tm

If, on the other hand, you want to strip all sup tags and their content, you would first elsewhere define a template matching sup (and do the same with anything you want to rip out, e.g. script tags, img tags, whatever):

<xsl:template match="sup" /> <!-- replace sup with nothing -->

And then you can apply

<xsl:apply-templates select="name" />

If you really wanted, you could even do something like this and replace that HTML with a nice unicode symbol. It might be a good idea to place this in a different mode and use that mode to eliminate all other tags.

<xsl:template match="sup" mode="mangle-name">
  <xsl:if test="'tm' = string(.)">
  &#8482;
  </xsl:if>
</xsl:template>

<!-- Later, somewhere else: -->
<xsl:apply-templates select="name" mode="mangle-name" />

Disclaimer on all of this: It's standard XSLT (probably 1.0 even), but I've only tried it in an online Saxon parser and not in Xalan.

Jesse Millikan
thanks for the reply
Bilzac