views:

55

answers:

3

I have a series of medium-sized XML documents, which are mainly text with a few nodes representing macros to be expanded, e.g.:

<foo>Some text <macro>A1</macro> ... <macro>B2</macro> ...etc...</foo>

My goal is to replace each macro with the corresponding XML. Usually it's a single <img> tag with different attributes, but it could be some other HTML as well.

The stylesheet is generated programatically, and one way to do that would be to have a template per macro, e.g.

<xsl:template match="macro[.='A1']">
    <!-- content goes here -->
</xsl:template>
<xsl:template match="macro[.='A2']">
    <!-- other content goes here -->
</xsl:template>
<xsl:template match="macro[.='B2']">
    <!-- etc... -->
</xsl:template>

It works just fine, but it can have up to a hundred macros and it's not very performant (I'm using libxslt.) I've tried a couple of alternative such as:

<xsl:template match="macro">
    <xsl:choose>
        <xsl:when test=".='A1'">
            <!-- content goes here -->
        </xsl:when>
        <xsl:when test=".='A2'">
            <!-- other content goes here -->
        </xsl:when>
        <xsl:when test=".='B2'">
            <!-- etc... -->
        </xsl:when>
    </xsl:choose>
</xsl:template>

It's slightly more performant. I have tried adding another level of branching such as:

<xsl:template match="macro">
    <xsl:choose>
        <xsl:when test="substring(.,1,1) = 'A'">
            <xsl:choose>
                <xsl:when test=".='A1'">
                    <!-- content goes here -->
                </xsl:when>
                <xsl:when test=".='A2'">
                    <!-- other content goes here -->
                </xsl:when>
            </xsl:choose>
        </xsl:when>
        <xsl:when test=".='B2'">
            <!-- etc... -->
        </xsl:when>
    </xsl:choose>
</xsl:template>

It loads slightly slower (the XSL is bigger and a bit more complicated) but it executes slightly faster (each branch can eliminate several cases.)

Now I am wondering, is there a better way to do that? I have around 50-100 macros. Normally, the transformation is executed using libxslt but I can't use proprietary extensions from other XSLT engines.

Any input is welcome :)

+1  A: 

If your macros are fixed XML, you could have a macros.xml document that is something like:

<?xml version="1.0"?>
<macros>
    <macro name="A1"><!-- content goes here --></macro>
    <macro name="A2"><!-- content goes here --></macro>
</macros>

You can then have:

<xsl:template match="macro">
    <xsl:variable name="name" select="text()"/>
    <xsl:apply-templates select="document(macros.xml)/macros/macro[@name=$name]" mode="copy"/>
</xsl:template>

<xsl:template match="*" mode="copy">
    <xsl:copy><xsl:copy-of select="*"/>
        <xsl:apply-templates mode="copy"/>
    </xsl:copy>
</xsl:template>

Does this improve the performance?

reece
I have tried `<xsl:template match="macro"><xsl:copy-of select="document('macros.xml')/macros/macro[@name=current()]/node()" /></xsl:template></xsl:stylesheet>` but the performance was worse, which is expected considering it has to access an external resource and run some XPath on it. Perhaps I could try it again with an xsl:key somewhere... thanks though :)
Josh Davis
Have you tried loading the document into a variable and using that -- you'll need to use an extension to support variables as node-sets, though. This will then only load the document once.
reece
+1  A: 

Try extracting all the macro processing into a separate template mode, and only run it for the contents of macro element. I.e.:

<xsl:template match="macro">
   <xsl:apply-templates mode="macro"/>
</xsl:template>

<xsl:temlate match="text()[. = 'A1']" mode="macro">
   ...
</xsl:template>

I suspect the slowdown in your case is because your rules get checked, one by one, for every node in input. This way, you get one check to see if it's a macro, and only if it is does it content get matched further.

Pavel Minaev
That approach has about the same performance as my first one, with a small penalty due to the extra `<xsl:apply-templates/>`. Either way it's not actually slow, or at least not slower than expected of an XSL transformation. libxslt does a decent job, I'm just trying to see if I can squeeze more performance out of it.
Josh Davis
+1  A: 

This would be another way:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" exclude-result-prefixes="xsl">
    <xsl:variable name="dummy">
            <mac id="A1">success</mac>
            <mac id="A2">fail</mac>
            <mac id="B1">This <b>fail</b></mac>
            <mac id="B2">This <b>success</b></mac>
    </xsl:variable>
    <xsl:key name="macro" match="mac" use="@id"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="macro">
        <xsl:variable name="me" select="."/>
        <xsl:for-each select="document('')">
            <xsl:copy-of select="key('macro',$me)/node()"/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Note: This was the performance: XML parse 1,805ms, XSL parse 0,483ms, Transform 0,215ms

Alejandro
Weirdly enough, I had already tried something similar but it didn't work because `document('')` returned me the root of the source document rather than the root of the stylesheet. I've just checked the XSLT specs and indeed, it should return the root of the stylesheet so either I've found a bug or I messed up somewhere. I'll try again and keep you updated, thanks.
Josh Davis
Indeed, that was a known libxslt bug I was hitting (https://bugzilla.gnome.org/show_bug.cgi?id=549552) I got it to work though, and the performance seems about on par with the second approach I listed. Seems it scales better if the same macro is used repeatedly.
Josh Davis