tags:

views:

42

answers:

1

I'm trying to develop an XSLT stylesheet which transforms a given DocBook document to a file which can be fed to the lout document formatting system (which then generates PostScript output).

Doing so requires that I replace a few characters in the text of DocBook elements because they have a special meaning to lout. In particular, the characters

/ | & { } # @ ~ \ "

need to be enclosed in double quotes (") so that lout treats them as ordinary characters.

For instance, a DocBook element like

<para>This is a sample {a contrived one at that} ~ it serves no special purpose.</para>

should be transformed to

@PP
This is a sample "{"a contrived one at that"}" "~" it serves no special purpose.

How can I do this with XSLT? I'm using xsltproc, so using XPath 2.0 functions is not an option but a number of EXSLT functions are available.

I tried using a recursive template which yields the substring up to a special character (e.g. {), then the escaped character sequence ("{") and then calls itself on the substring after the special character. However, I have a hard time making this work properly when trying to replace multiple characters, and one of them is used in the escaped sequence itself.

+2  A: 

In particular, the characters

/ | & { } # @ ~ \ " 

need to be enclosed in double quotes (") so that lout treats them as ordinary characters.

I. This is most easily accomplished using the str-map template of FXSL:

<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:f="http://fxsl.sf.net/"
 xmlns:strmap="strmap"
 exclude-result-prefixes="xsl f strmap">
   <xsl:import href="str-dvc-map.xsl"/>

   <xsl:output method="text"/>

   <strmap:strmap/>

   <xsl:template match="/">
     <xsl:variable name="vMapFun" select="document('')/*/strmap:*[1]"/>
     @PP
     <xsl:call-template name="str-map">
       <xsl:with-param name="pFun" select="$vMapFun"/>
       <xsl:with-param name="pStr" select="."/>
     </xsl:call-template>
   </xsl:template>

    <xsl:template name="escape" match="strmap:*" mode="f:FXSL">
      <xsl:param name="arg1"/>

      <xsl:variable name="vspecChars">/|&amp;{}#@~\"</xsl:variable>

      <xsl:variable name="vEscaping" select=
       "substring('&quot;', 1 div contains($vspecChars, $arg1))
       "/>

      <xsl:value-of select=
      "concat($vEscaping, $arg1, $vEscaping)"/>
    </xsl:template>

</xsl:stylesheet>

when this transformation is aplied on the provided XML document:

<para>This is a sample {a contrived one at that} ~ it serves no special purpose.</para>

the wanted, correct result is produced:

@PP This is a sample "{"a contrived one at that"}" "~" it serves no special purpose.

II. With XSLT 1.0 recursive named template:

<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
   <xsl:output method="text"/>

   <xsl:template match="/">
     @PP
     <xsl:call-template name="escape">
       <xsl:with-param name="pStr" select="."/>
     </xsl:call-template>
   </xsl:template>

    <xsl:template name="escape">
     <xsl:param name="pStr" select="."/>
     <xsl:param name="pspecChars">/|&amp;{}#@~\"</xsl:param>

     <xsl:if test="string-length($pStr)">
         <xsl:variable name="vchar1" select="substring($pStr,1,1)"/>

          <xsl:variable name="vEscaping" select=
           "substring('&quot;', 1 div contains($pspecChars, $vchar1))
           "/>

          <xsl:value-of select=
          "concat($vEscaping, $vchar1, $vEscaping)"/>

          <xsl:call-template name="escape">
           <xsl:with-param name="pStr" select="substring($pStr,2)"/>
           <xsl:with-param name="pspecChars" select="$pspecChars"/>
          </xsl:call-template>
      </xsl:if>
    </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
+1: Thanks for providing *two* solutions! I already anticipated that you would easily whip up an FXSL solution (given that it seems to be your own project ;-)) but I'm very grateful for also pointing out how it could be done with plain old XSLT 1.0. I'll try your solutions when I'm out of the office.
Frerich Raabe
The second variant recurses for every single character, right? That might be an issue for me (since xsltproc, by default, only allows 1000 nested function calls - and my texts might be more than 1000 characters long). I guess the first solution doesn't suffer from this (but I'm not sure, because I find it a bit hard to read ;-)).
Frerich Raabe
@Frerich-Raabe: There is a nice way to minimize the recursion depth. Please, ask a separate question and I will demo and explain this method.
Dimitre Novatchev
@Dimitre: +1 for FXSL and plain recursion. You've became greedy about DVC pattern, ja!
Alejandro
@Alejandro: Few people will notice the DVC explanation if it is buried inside a question with totally unrelated title. This topic deserves its own question.
Dimitre Novatchev
@Alejandro: Ah! Thanks for mentioning DVC, I didn't think of that!
Frerich Raabe
@Frerich-Raabe: Actually, DVC *was* mentioned already: look at the code in my solution -- do you notice `<xsl:import href="str-dvc-map.xsl"/>` ? :). Yes, for the FXSL solution I already chose the DVC implementation of `str-map`. Isn't it so convenient when you have DVC pre-coded for you and you don't need to code it manually? :)
Dimitre Novatchev
@Dimitre: Ah, no - I didn't notice. This is the first time I heard about FXSL, I didn't recognize the `dvc` part of the file name as the "divide and conquer" I know from other recursive algorithsm. :-)
Frerich Raabe
@Frerich-Raabe: If this is the 1st time you heard of FXSL and you've had prior FP exposure, then you'd probably like it -- just read the FXSL 2 (XSLT 2.0 - based) conference article.
Dimitre Novatchev
@Dimitre: Your paper link is no longer functional.
Alejandro
@@Alejandro, @Frerich-Raabe: Sorry, the PDF link is this: http://web.archive.org/web/20070222111927/http:/www.idealliance.org/papers/extreme/proceedings/xslfo-pdf/2006/Novatchev01/EML2006Novatchev01.pdfThe HTML link is this:http://conferences.idealliance.org/extreme/html/2006/Novatchev01/EML2006Novatchev01.html
Dimitre Novatchev