tags:

views:

86

answers:

4

I need to iterate over the characters in a string to build an XML structure.

Currently, I am doing this :

<xsl:template name="verticalize">
    <xsl:param name="text">Some text</xsl:param>
    <xsl:for-each select="tokenize(replace(replace($text,'(.)','$1\\n'),'\\n$',''),'\\n')">
        <xsl:element name="para">
            <xsl:value-of select="."/>
        </xsl:element>
    </xsl:for-each>
</xsl:template>

This produces something like :

<para>S</para>
<para>o</para>
<para>m</para>
<para>e</para>
<para> </para>
<para>t</para>
<para>e</para>
<para>x</para>
<para>t</para>

This works fine with Xpath 2.0. But I need to apply the same treatment in a XPath 1.0 environment, where the replace() method is not available.

Do you know a way to achieve this ?

+4  A: 

If the string length is not huge, you can use a recursively called template to achieve this, passing the index of the character to be processed as parameter into the template.

Like so:

<xsl:template name="verticalize">
    <xsl:param name="text">Some text</xsl:param>
    <xsl:param name="index" select="1" />
    <xsl:if test="string-length($text) &gt;= $index">
        <xsl:element name="para">
            <xsl:value-of select="substring($text, $index, 1)"/>
        </xsl:element>
        <xsl:call-template name="verticalize">
            <xsl:with-param name="text" select="$text" />
            <xsl:with-param name="index" select="$index+1" />
        </xsl:call-template>
    </xsl:if>
</xsl:template>

If the string is longer than that, you can use a similar approach but with a divide-and-conquer algorithm, so that you have a maximum recursion depth of log2(string-length), like so:

<xsl:template name="verticalize">
    <xsl:param name="text">Some text</xsl:param>
    <xsl:param name="left" select="1" />
    <xsl:param name="right" select="string-length($text)" />
    <xsl:choose>
        <xsl:when test="$left = $right">
            <xsl:element name="para">
                <xsl:value-of select="substring($text, $left, 1)"/>
            </xsl:element>
        </xsl:when>
        <xsl:when test="$left &lt; $right">
            <xsl:variable name="middle" select="floor(($left+$right) div 2)" />
            <xsl:call-template name="verticalize">
                <xsl:with-param name="text" select="$text" />
                <xsl:with-param name="left" select="$left" />
                <xsl:with-param name="right" select="$middle" />
            </xsl:call-template>
            <xsl:call-template name="verticalize">
                <xsl:with-param name="text" select="$text" />
                <xsl:with-param name="left" select="$middle+1" />
                <xsl:with-param name="right" select="$right" />
            </xsl:call-template>
        </xsl:when>
    </xsl:choose>
</xsl:template>
Lucero
Thanks a lot for this detailed answer.
subtenante
You're welcome - but why did you not award me the answer points? I was 3 minutes before Tomalak and has already posted the first sample...
Lucero
@Lucero: You posted a boilerplate answer without code. I had my code ready before you did. ;) But you get +1 from me for the divide/conquer variant. One hint though: Recursing over the rest of the string only (`substring-after()`) minimizes memory usage. Your recursion passes a copy of the entire string in every step, and this adds up until the stack unwinds.
Tomalak
@Tomalak, I think you're wrong about the memory usage. Typically, my code with the `select` of a unchanged text node will always pass on a reference to the same text node instead of re-creating a string copy on each call, thereby it should be much cheaper in that is uses *less* memory and avoids copying the string over and over again. (And regarding the point in time, it is in the 1st revision, maybe I added it in the 5-minute window, can't remember.)
Lucero
@Lucero: Granted, if you work with actual text nodes then I'd expect it would pass a reference and a bunch of integers around. Different processors would implement this differently internally, so I guess testing/benchmarking would be required.
Tomalak
+1 for the DVC! :)
Dimitre Novatchev
@Lucero : it was a close tie but Tomalak's answer seemed more elegant. I knew you'd complain though, they all do... :)
subtenante
+4  A: 
<xsl:template name="letters">
  <xsl:param name="text" select="'Some text'" />
  <xsl:if test="$text != ''">
    <xsl:variable name="letter" select="substring($text, 1, 1)" />
    <para><xsl:value-of select="$letter" /></para>
    <xsl:call-template name="letters">
      <xsl:with-param name="text" select="substring-after($text, $letter)" />
    </xsl:call-template>
  </xsl:if>
</xsl:template>
Tomalak
Tomalak, you're my XSLT-hero.
subtenante
@subtenante: Thank you ;-)
Tomalak
+1 for the tail-recursive solution!
Dimitre Novatchev
+2  A: 

An XSLT 2.0 Solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vText" select="'Some Text'"/>

 <xsl:template match="/">
    <xsl:for-each select="string-to-codepoints($vText)">
      <para><xsl:sequence select="codepoints-to-string(.)"/></para>
    </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

For those of you, learning XSLT 2.0 /XPath 2.0, do note:

  1. The use of the standard XPath 2.0 functions string-to-codepoints() and codepoints-to-string().

  2. In XSLT 2.0 the value of the select attribute of <xsl:for-each> may be a sequence of any items, not only nodes.

Dimitre Novatchev
Nice solution. However, it will not work as expected when combining characters are used to represent accents etc.: http://en.wikipedia.org/wiki/Combining_character
Lucero
@Lucero: That's interesting. Could you give me a "combined character" example, so that I would better understand? Also, it seems to me that none of the other solutions would work in this case either.
Dimitre Novatchev
@Dimitre, for samples have a look at Wikipedia: http://en.wikipedia.org/wiki/Unicode_normalizationI'm not sure whether the other solutions would work or not; I'd expect the string functions to work on composed characters, since everything else would break the single (textual) characters. I'd have to check in the XPath/XSLT specs on how these work.
Lucero
@Dimitre, have a look at the `normalize-unicode` XSLT 2 function: http://www.w3.org/TR/xpath-functions/#func-normalize-unicode - I guess that might help with getting a correct result.
Lucero
@Lucero: I need just one example of a combined character. Please...
Dimitre Novatchev
@D̀i̫m̴i̐t̽r̾e̚, I'm not sure whether this works across the browser and SO storage, so it may be best to generate one yourself: on a Windows machine, open the character map (charmap.exe), choose "Arial" as font and "Unicode" as character set, choose to group by "Unicode Subrange", take "Combining Diactrical Marks", type a character (abc... whatever you like!), and then append any one, two or even more combining marks just as you like (more than 2 may not be displayed). You'll see that those get applied to the character you have typed, and hitting backspace will remove them one by one as well.
Lucero
(cont.) Another amazing feature of Unicode is the Complex Text Layout, which allows combining complex characters similar to the diacritic system: द + ् + ध + ् + र + ् + य = द्ध्र्य - http://en.wikipedia.org/wiki/Complex_text_layout - these should ideally be handled as one character as well I think...
Lucero
+1  A: 

An XSLT 1.0 Solution using FXSL

The FXSL library offers a number of generic functions for list processing. Almost all of them have an analog for operating on strings (regarding a string as a list of characters).

Here is an example using the str-foldl function/template:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dvc-foldl-func="dvc-foldl-func"
exclude-result-prefixes="xsl dvc-foldl-func"
>

   <xsl:import href="dvc-str-foldl.xsl"/>

   <dvc-foldl-func:dvc-foldl-func/>
   <xsl:variable name="vFoldlFun" select="document('')/*/dvc-foldl-func:*[1]"/>
    <xsl:output  encoding="UTF-8" omit-xml-declaration="yes"/>

    <xsl:template match="/">

      <xsl:call-template name="dvc-str-foldl">
        <xsl:with-param name="pFunc" select="$vFoldlFun"/>
        <xsl:with-param name="pStr" select="123456789"/>
        <xsl:with-param name="pA0" select="0"/>
      </xsl:call-template>
    </xsl:template>

    <xsl:template match="dvc-foldl-func:*">
         <xsl:param name="arg1" select="0"/>
         <xsl:param name="arg2" select="0"/>

         <xsl:value-of select="$arg1 + $arg2"/>
    </xsl:template>

</xsl:stylesheet>

This transformation calculates the sum of the characters in the string passed as the $pStr parameter and produces the correct result:

45

And using the str-map template/function we have the following easy and short solution:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:testmap="testmap"
exclude-result-prefixes="xsl testmap"
>
   <xsl:import href="str-dvc-map.xsl"/>

   <!-- to be applied on any xml source -->

   <testmap:testmap/>

   <xsl:output omit-xml-declaration="yes" indent="yes"/>

   <xsl:template match="/">
     <xsl:variable name="vTestMap" select="document('')/*/testmap:*[1]"/>
     <xsl:call-template name="str-map">
       <xsl:with-param name="pFun" select="$vTestMap"/>
       <xsl:with-param name="pStr" select="'Some Text'"/>
     </xsl:call-template>
   </xsl:template>

    <xsl:template name="split" match="*[namespace-uri() = 'testmap']">
      <xsl:param name="arg1"/>

      <para><xsl:value-of select="$arg1"/></para>
    </xsl:template>

</xsl:stylesheet>

When applied on any XML file (not used), the wanted, correct result is produced:

<para>S</para>
<para>o</para>
<para>m</para>
<para>e</para>
<para> </para>
<para>T</para>
<para>e</para>
<para>x</para>
<para>t</para>
Dimitre Novatchev