tags:

views:

491

answers:

3

EDIT: [it started with character replacement and I ended up with discovering string replacements with help of Dimitre Novatchev and Roland Bouman

I think the sample codes are sufficient to explain the requirements ..

This is the sample XML:

<root>
  <node1>text node</node1>
  <node2>space between the text</node2>
  <node3> has to be replaced with $</node3>
</root>

This is the Output I am expecting:

<root>
  <node1>text$node</node1>
  <node2>space$between$the$text</node2>
  <node3>$has$to$be$replaced$with$$</node3>
</root>

I have tried writing an XSLT code which isn't showing the required output ..
This is the code:

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
  <xsl:template match="text()[.!='']">
    <xsl:call-template name="rep_space">
      <xsl:with-param name="text" select="."/>
    </xsl:call-template>
  </xsl:template>
  <xsl:template name="rep_space">
    <xsl:param name="text"/>
    <xsl:variable name="temp" select="'&#x36;'"/> 
    <xsl:choose>
      <xsl:when test="contains(text,'&#x32;')">
        <xsl:call-template name="rep_space">
          <xsl:with-param name="text" select="concat((concat(substring-before(text,' '),temp)),substring-after(text,' '))"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="text"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

translate(., ' ', '$')function works .. but not to the satisfactory extent .. my questions are .. what if it is a string instead of character? I mean, suppose I am intended to replace ' ' with "%20"? And one more case, What if the input XML isn't "Pretty Print XML", then all the space appearing in XML are replaced with '$' ..

Pretty print XML is the file which has proper indent, (Usually my input XMLs never have this) for example:

one more node this is @ lower level

You can observe, there are no "space characters" before <new> <test> nodes but they are actually properly indented, (With altova XMLSPY we can give a simple command in edit menu .. to make any XML files to "pretty print XML") ..

Where as in the below example ..

<new>
  <test>one more node</test>
   <test2>
    <child>this is @ lower level</child>
   </test2>
</new>

There are space chars before all the start tags .. <child> tag has more spaces before it than <test2> node ..

With the second sample xml .. all the space chars are replaced by "%20".. hence the output will be ..

<new>
%20%20<test>one%20more%20node</test>
%20%20<test2>
%20%20%20%20<child>this%20is%20@%20lower%20level</child>
%20%20</test2>
</new>

certainly it is not expected ..

The solutions posted by Dimitre Novatchev and Roland Bouman can also replace a string by another string, by modifying the parameters passed to the template being called.

That was great learning @Dimitre, @Roland, I am really thankful and grateful to you guys ..

regards,
infant pro.

+3  A: 

Check out the XPath translate function: http://www.w3.org/TR/xpath/#function-translate

<xsl:template match="text()">
    <xsl:value-of select="translate(., ' ', '$')"/>
</xsl:template>

If it's not a single character, but a string you have to replace, it takes considerably more effort, and you need a template to recursively replace the string:

<xsl:template match="text()[not(../*)]">
    <xsl:call-template name="replace">
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="search" select="' '"/>
        <xsl:with-param name="replace" select="'%20'"/>
    </xsl:call-template>
</xsl:template>

<xsl:template name="replace">
    <xsl:param name="text"/>
    <xsl:param name="search"/>
    <xsl:param name="replace"/>
    <xsl:choose>
        <xsl:when test="contains($text, $search)">
            <xsl:variable name="replace-next">
                <xsl:call-template name="replace">
                    <xsl:with-param name="text" select="substring-after($text, $search)"/>
                    <xsl:with-param name="search" select="$search"/>
                    <xsl:with-param name="replace" select="$replace"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:value-of 
                select="
                    concat(
                        substring-before($text, $search)
                    ,   $replace
                    ,   $replace-next
                    )
                "
            />
        </xsl:when>
        <xsl:otherwise><xsl:value-of select="$text"/></xsl:otherwise>
    </xsl:choose>
</xsl:template>

Edit:changed match="text()" to match="text()[not(../*)]", so that the input xml need not be a kind of "pretty print XML" .. (so as to remove unwanted replacements of space with "%20" string in such xml file)

Roland Bouman
oops .. so easy .. :-P thanks for response ..:-)
infant programmer
np. surprisingly, a function like this is not available in many languages, so it's easy to oversee its existence.
Roland Bouman
ohk .. btw .. what if it is a string instead of character? I mean, suppose I am intended to replace ' ' with "%20"? And one more case, What if the input XML isn't "Pretty Print XML", then all the space appearing in XML are replaced with '$' ..
infant programmer
Well, basically, you're going to have a hard time in that case :) You're going to have to write a 'replace-text' template that recursively replaces text. I'll amend my answer to show how.
Roland Bouman
thank you so much .. :-)
infant programmer
The solution is essentially right with the exception of the treatment of white-space-only nodes -- see my answer for a correct solution of this part of the problem.A minor issue with this solution is that it is not tail-recursive and could lead to stack overflow due to very deep call stack. I leave the refactoring to tail-recursive as an exercise for you :)
Dimitre Novatchev
Dimitre, thanks for pointing out these issues. While I am aware of the lack of tail-recursiveness, what I am really interested in is: do you know how different XSLT engines optimize and rewrite to obtain it? I mean, I don't know and would like to learn.
Roland Bouman
Roland-Bouman: The XSLT processors do not "rewrite to obtain it" -- the human author of the xslt template must write it in a tail-recursive way. :) If you want, I will provide a tail-recursive solution -- in a separate answer.
Dimitre Novatchev
Sure, let's have it. TIA
Roland Bouman
We have it now :)
Dimitre Novatchev
+1  A: 

The solution to the "prety-printed xml" is not really a solution.

Imagine having a document like this:

<a>
 <b>
  <c>O M G</c>
  <d>D I Y</d>
 </b>
</a>

The output from the currently accepted solution (after wrapping it in an <xsl:stylesheet> and adding the identity rule is:

<a>
%20<b>
%20%20<c>O$M$G</c>
%20%20<d>D$I$Y</d>
%20</b>
</a>

Now, why doesn't the proposed workaround save the situation? As we see from the above example, an element can have more than one child element that has text nodes...

What is the real solution?

The creators of XSLT have thought about this problem. Using the right terminology, we want all insignificant white-space-only text nodes to be ignored by the XSLT processor, as if they were not part of the document tree at all. This is achieved by the <xsl:strip-space> instruction.

Just add this at a global level (as a child of <xsl:stylesheet> and, for readability, before any templates):

 <xsl:strip-space elements="*"/>

and now you really have a working solution.

Dimitre Novatchev
@Dimitre, thanks for remonstrating (in usual way)proper method :-) But problem with strip-space is that, output xml will be written in a single line[when opened in notepad or similar editors we can observe this], but not really a problem because it doesn't matter to XML editors ..
infant programmer
@infant-programmer: it doesn't matter if the xml is in one line or not -- if it has white-space-only nodes, my solution will strip all of them. If what you say is in fact that there are no white-space-only text nodes, then the xsl:strip-space instruction will do nothing -- it is harmless in this case. The solution I provided is more general -- it works in both cases: when there are white-space-only nodes and when there aren't such.
Dimitre Novatchev
@Dimitre, yup got it :-)
infant programmer
+2  A: 

As per the wish of Roland, here is a tail-recursive solution:

 <xsl:template name="replace">
  <xsl:param name="ptext"/>
  <xsl:param name="ppattern"/>
  <xsl:param name="preplacement"/>

  <xsl:choose>
     <xsl:when test="not(contains($ptext, $ppattern))">
      <xsl:value-of select="$ptext"/>
     </xsl:when>
     <xsl:otherwise>
       <xsl:value-of select="substring-before($ptext, $ppattern)"/>
       <xsl:value-of select="$preplacement"/>
       <xsl:call-template name="replace">
         <xsl:with-param name="ptext"
           select="substring-after($ptext, $ppattern)"/>
         <xsl:with-param name="ppattern" select="$ppattern"/>
         <xsl:with-param name="preplacement" select="$preplacement"/>
       </xsl:call-template>
     </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

Note that the recursive call is the last instruction in the template -- this is what makes it tail-recursive. The property of being tail-recursive allows a smart XSLT processor (such as Saxon or .NET XslCompiledTransform) to optimize the code, replacing the recursion with simple iteration.

Such code will not end up with a stack-overflow exception even when the "nesting" of calls is millions, whereas non-tail-recursive (and recursive) code typically raises this stack-overflow at a depth of about 1000 nested calls (this really depends on the amount of the available memory).

What if the XSLT processor is not "smart enough"? Is there another technique to avoid deep-level recursive calls stack overflow, that works with every XSLT processor?

Ask me in a separate question and I might tell you :)

Dimitre Novatchev
wow!! this is awesome .. :-) :-) accepted as sol :-)
infant programmer
@infant-programmer, Thanks, there are many beautiful applications of XSLT and I would be happy to reveal at least some of them in my answers.
Dimitre Novatchev
that's great of you @Dimitre, :-)
infant programmer
Dimitre, thanks for providing this. One thing is not clear to me though. You say that the recusive call is the last instruction in the template. However, if I parse the template, it seems to me there is one top-level instruction, which is the `<xsl:choose>`. So by definition, not the `<xsl:call-template>` but the `<xsl:choose>` is the last instruction. Or is this still considered tail recursive because the `<xsl:choose>` doesn't actually do something, except execute one of its branches?
Roland Bouman
@Roland-Bouman, the `<xsl:call-template>` is the last *executable* instruction -- after it there are only ending tags. This is used by some smart enough XSLT processors in deciding that the template is tail-recursive and then to perform the optimization to using iteration code.
Dimitre Novatchev
Dimitre, thanks. So I guess this answers the question I posed to you earlier about if and how XSLT processors "rewrite" to obtain tail recursion.
Roland Bouman