views:

981

answers:

11

As a programmer, I am developing programs in procedural and OOP languages for many years now, and I guess I know beautiful and efficient code when I see it (or when I write it).

Recently I started to work with XSLs, and while they deliver the results I expect, I have no idea whether they are nice and beautiful and efficient. (well, they mainly consist of templates and apply-templates, but matching may also hide some performance issues)

So the question is: How do I judge the quality of my XSLs?

+3  A: 

It would be better not to use the habits from the OO world, as XSLT is from a different world - functional world. No mutable variables, no normal loops, recursion, lack of side effects... Usually a bit different from what we see in most applications. Thus I would not say that it is that ugly. Just more difficult to read...

Anonymous
+3  A: 

One characteristic of "ugly" XSL (or I would say a bad smell in XSL) is over reliance on calling named templates as functions. I've seen this quite often when better use of matches and apply-templates would make a more elegant solution.

AnthonyWJones
+2  A: 

XSLT is a pretty powerful language (I think I read once that it is Turing-complete), but you should use it for these situations it is designed for. That includes XML transformations based on pattern matching and XML transformations based on templates.

When you start using XSLT in a procedural manner, it becomes cumbersome and the XML syntax is turning against you. The same holds for string processing or comparison functions.

I have seen excellent results using a combination of XSLT with C# extension functions. Performance has not been a problem for us.

Beauty ~ well-structured, pattern matching templates, recursion. Efficiency ~ avoid complex XPaths

Rine
+17  A: 

Understanding template matching, especially apply-templates is the number one skill an XSLTer needs to learn. It is impossible to overstate how much XSLT is not like C# or Java or anything OO.

Typical hideousness in XSLT includes:

  • overuse of xsl:for-each and xsl:call-template

  • bad, longwinded xpaths (especially ones beginning with "./")

  • not using keys where suitable

  • failure to grasp recursion (wrt templates)

  • thinking of templates as functions

annakata
I agree wholly... It irritates me especially when I see XPaths that are always relative to the root element (/xx/yyy), not the current context or template.
Cerebrus
Indeed - I think the second most important concept to learn is *knowing where you are*
annakata
Would you consider use of `mode`, both in `apply-templates` and in `template` as good or bad?
Remus Rusanu
good, but with potential for abuse (much as any tool) - personally I tend to think of it as something analogous to classification and a clean way of expressing design
annakata
+1  A: 

Ask someone else who is an XSL expert. Peer review is a powerful thing if done well.

Fortyrunner
A: 

XSLT is declarative, so performance actually depends on the processors' optimizers. Processors such as Saxon have been evolving for quiet some time, getting more optimization with each release. Also, there are profiles that emulate most popular processors.

You should keep in mind, that there are two ways of processing XML documents. SAX which is a serial parser, very efficient for large documents and in-order transformation. On the other hand DOM which consist of creating a object tree for the whole document, perfect for small documents and transformations completely out-of-order (eg. generating table of contents). So XSLT optimal for one, will not be so for the other.

vartec
SAX is nice but not the defacto XSLT parser when it comes to speed in fact MSXML and Compiled XSLT's in .NET are much faster.
Martijn Laarman
http://blogs.msdn.com/antosha/archive/2006/07/24/677560.aspx
Martijn Laarman
You mean Saxon? MSXML has both SAX and DOM modes.
vartec
+7  A: 
  • Make sure you know about xsl:key and Muenchian grouping
  • Avoid script extensions, especially if used to keep mutable state
  • Avoid complex choose/etc logic/flow requirements - you're probably missing an "apply-templates" somewhere...

But as always, time and experience is important...

Marc Gravell
It seems that no one is reading these answers. +1 from me. I enjoy reading your answers, which are always to the point.
Dimitre Novatchev
Well thanks for that ;-p
Marc Gravell
What is problematic with script extensions? In my opinion they can make complicated templates faster, more efficient and easier to read/maintain (thinking of string manipulations, regex, date formatting etc)
0xA3
@divo - absolutely - I use them myself; but as I stated, the problem is using them cross-vendor - i.e. it is no longer pure xslt. As long as you make an *informed* decision to break compatibility, that is fine
Marc Gravell
+7  A: 

As far as efficiency goes, in my experience the efficiency of an <xsl:apply-template> generally boils down to three quantities:

  • When applying templates, how many nodes in the input document does the XPath query examine?
  • How many nodes in the input document does the XPath query return?
  • How many templates get examined for each node returned?

To a first (very rough) approximation, the processing required to perform a transform is going to be proportional to the product of those three numbers. Usually inefficiencies come from visiting more nodes in the document than you need to and applying templates to more nodes than you should.

For a crude example, this:

<xsl:apply-templates select="//foo"/>
...
<xsl:template match="foo[bar]">
   ...
</xsl:template>
<xsl:template match="foo"/>

may be extraordinarily inefficient, especially if the structure of the input document is sufficiently known that it could have been written like this:

<!-- 
   I actually know that foo elements are great-grandchildren of the context node 
   Also, I'm filtering out foo elements with no bar child at the time I'm building
   the node set, and not relying on template matching to do it.
-->
<xsl:apply-templates select="./*/*/foo[bar]" mode="foo_no_bar"/>
...
<!-- 
   I don't have to include a predicate here because I used it in my select, and
   I'm using the mode to disambiguating between this and a (possibly) more 
   generic template for transforming foo elements.
-->
<xsl:template match="foo" mode="foo_no_bar">
   ...
</xsl:template>

Similarly, it's important not to perform any query more times than you need to. When coding on the fly, it's easy to find yourself writing something like this:

<xsl:if test="foo[@bar='baz']">
    <xsl:text>first: </xsl:text>
    <xsl:value-of select="foo[@bar='baz']/bat[1]"/>
    <xsl:text>second: </xsl:text>
    <xsl:value-of select="foo[@bar='baz']/bat[2]"/>
</xsl:if>

when it should be:

<xsl:variable name="list" select="foo[@bar='baz']"/>
<xsl:if test="$list">
   <xsl:text>first: </xsl:text>
   <xsl:value-of select="$list[1]"/>
   <xsl:text>second: </xsl:text>
   <xsl:value-of select="$list[2]"/>
</xsl:if>

This is easier to write and maintain, and it also doesn't do more work than it needs to do.

The thing about reasonably well-written XSLT is that it's astonishingly easy to maintain. it doesn't seem like it would be, given how verbose and seemingly cryptic it is. But I regularly find myself making quick changes to 2000-3000 line transforms that I wrote two years ago and having them work properly on the first try. The side-effect-free nature of the language helps a lot.

Robert Rossney
+3  A: 

In addition to the good advice given in many of the previous answers, let me add the following.

One can often find examples of beautiful XSLT code, especially when XSLT is used as a functional programming language.

For examples see this article on FXSL 2.0 -- the Functional Programming library for XSLT 2.0.

As an FP language XSLT is also a declarative language. This, among other things means that one declares, specifies existing relationships.

Such a definition often does not need any additional code to produce a result -- it itself is its own implementation, or an executable definition or executable specification.

Here is a small example.

This XPath 2.0 expression defines the "Maximum Prime Factor of a natural number":

if(f:isPrime($pNum))
  then $pNum
  else
    for $vEnd in xs:integer(floor(f:sqrt($pNum, 0.1E0))),
        $vDiv1 in (2 to $vEnd)[$pNum mod . = 0][1],
        $vDiv2 in $pNum idiv $vDiv1
      return
        max((f:maxPrimeFactor($vDiv1),f:maxPrimeFactor($vDiv2)))

To pronounce it in English, the maximum prime factor of a number pNum is the number itself, if pNum is prime, otherwise if vDiv1 and vDiv2 are two factors of pNum, then the maximum prime factor of pNum is the bigger of the maximum prime factors of vDiv1 and vDiv2.

How do we use this to actually calculate the Maximum Prime Factor in XSLT? We simply wrap up the definition above in an <xsl:function> and ... get the result!

 <xsl:function name="f:maxPrimeFactor" as="xs:integer">
  <xsl:param name="pNum" as="xs:integer"/>

  <xsl:sequence select=
   "if(f:isPrime($pNum))
      then $pNum
      else
        for $vEnd in xs:integer(floor(f:sqrt($pNum, 0.1E0))),
            $vDiv1 in (2 to $vEnd)[$pNum mod . = 0][1],
            $vDiv2 in $pNum idiv $vDiv1
          return
            max((f:maxPrimeFactor($vDiv1),f:maxPrimeFactor($vDiv2)))
   "/>
 </xsl:function>

We can, then, calculate the MPF for any natural number, for example:

f:maxPrimeFactor(600851475143) = 6857

As for efficiency, well, this transformation takes just 0.109 sec.

Other examples of both ellegant and efficient XSLT code:

Dimitre Novatchev
+5  A: 

Here are some rules for writing "quality XSLT code", as taken from Mukul Ghandi's blog.

They can be checked/enforced using a tool developed by Mukul:

  1. DontUseDoubleSlashOperatorNearRoot: Avoid using the operator // near the root of a large tree.

  2. DontUseDoubleSlashOperator: Avoid using the operator // in XPath expressions.

  3. SettingValueOfVariableIncorrectly: Assign value to a variable using the 'select' syntax if assigning a string value.

  4. EmptyContentInInstructions: Don't use empty content for instructions like 'xsl:for-each' 'xsl:if' 'xsl:when' etc.

  5. DontUseNodeSetExtension: Don't use node-set extension function if using XSLT 2.0.

  6. RedundantNamespaceDeclarations: There are redundant namespace declarations in the xsl:stylesheet element.

  7. UnusedFunction: Stylesheet functions are unused.

  8. UnusedNamedTemplate: Named templates in stylesheet are unused.

  9. UnusedVariable: Variable is unused in the stylesheet.

  10. UnusedFunctionTemplateParameter: Function or template parameter is unused in the function/template body.

  11. TooManySmallTemplates: Too many low granular templates in the stylesheet (10 or more).

  12. MonolithicDesign: Using a single template/function in the stylesheet. You can modularize the code.

  13. OutputMethodXml: Using the output method 'xml' when generating HTML code.

  14. NotUsingSchemaTypes: The stylesheet is not using any of the built-in Schema types (xs:string etc.), when working in XSLT 2.0 mode.

  15. UsingNameOrLocalNameFunction: Using name() function when local-name() could be appropriate (and vice-versa).

  16. FunctionTemplateComplexity: The function or template's size/complexity is high. There is need for refactoring the code.

  17. NullOutputFromStylesheet: The stylesheet is not generating any useful output. Please relook at the stylesheet logic.

  18. UsingNamespaceAxis: Using the deprecated namespace axis, when working in XSLT 2.0 mode.

  19. CanUseAbbreviatedAxisSpecifier: Using the lengthy axis specifiers like child::, attribute:: or parent::node().

  20. UsingDisableOutputEscaping: Have set the disable-output-escaping attribute to 'yes'. Please relook at the stylesheet logic.

  21. NotCreatingElementCorrectly: Creating an element node using the xsl:element instruction when could have been possible directly.

  22. AreYouConfusingVariableAndNode: You might be confusing a variable reference with a node reference. (contributed by, Alain Benedetti)

  23. IncorrectUseOfBooleanConstants: Incorrectly using the boolean constants as 'true' or 'false'. (contributed by, Tony Lavinio)

  24. ShortNames: Using a single character name for variable/function/template. Use meaningful names for these features.

  25. NameStartsWithNumeric: The variable/function/template name starts with a numeric character

Dimitre Novatchev
+1  A: 
Mads Hansen