ansaurus

Question

Answer 1

+1 A:

It is definitely possible to write inefficient XSLT. However, if the discrepancy is truly 100K files vs. 10M files, I'm not sure what improvements you would see by making small modifications to your transform as that is one big file.

It might be worth posting some example code (if possible) to help describe what performance gains might be achieved.

Scott Saad 2008-10-22 19:22:53

Scott, I put some very crude sample in my latest post. Thanks!

KnomDeGuerre 2008-10-24 23:15:55

Answer 2

+4 A:

I'm not familiar with the .NET implementations, but there are a few things you can do in general to speed up processing of large documents:

Avoid using "//" in Xpath expressions unless absolutely necessary.
If you only need the first or only element that matches an Xpath expression, use the "[1]" qualifier, e.g. "//iframe[1]". Many processors implement optimizations for this.
Whenever possible, when dealing with huge XML input, see if you can design a solution around a stream-based parser (like SAX) instead of a DOM-based parser.

Marco 2008-10-22 19:25:51

Answer 3

+3 A:

Normally, if you see a non-linear increase in processing time vs. input size, you should suspect your code more than the framework. But since the problem goes away when the tool is compiled with .NET 2.0, all bets are off.

With XSLT, it's hard to create a non-linear performance curve if you do all your parsing with straight template matches:

<xsl:template match="foo">
  <!--OUTPUT-->
  <xsl:apply-templates / >
  <!--OUTPUT-->
</xsl:template>

 <xsl:template match="bar">
  <!--OUTPUT-->
  <xsl:apply-templates / >
  <!--OUTPUT-->
</xsl:template>

Pay careful attention to anywhere you might have resorted to <xsl:for-each> for parsing; template matches are virtually always a better way to achieve the same result.

One way to troubleshoot this performance problem is to recreate your XSLT one template-match at a time, testing the processing time after adding each match. You might start with this match:

<xsl:template match="*">
  <xsl:copy>                   <!--Copy node                   -->
    <xsl:copy-of select="@*"/> <!--Copy node attributes         -->
    <xsl:apply-templates />    <!--Process children             -->
  </xsl:copy>
</xsl:template>

This will match and copy every node, one at a time, to a new document. This should not exhibit a non-linear increase in processing time vs. input size (if it does, then the problem is not with your XSLT code).

As you recreate your XSLT, if you add a template-match that suddenly kills performance, comment out every block inside the template. Then, uncomment one block at a time, testing the processing time each iteration, until you find the block that causes the problem.

trebormf 2008-10-22 20:37:31

Answer 4

+1 A:

To Marco - one of your cautions does apply, and one of your recommendations can be implemented to resolve the caution:

One template in my transform does indeed have "//". The top-level node will only ever occur once, in any of the expected source files, so I changed it to use the [1] explicit index. For example, I changed:

<xsl:for-each select="//DataSet/Data/SomethingElse">

to

<xsl:for-each select="DataSet[1]/Data/SomethingElse">

(There can be multiple SomethingElse's in the source files, but only ever one DataSet per file.)

A quick check shows that this is not the major cause of slowdown; I started the transform going about five minutes ago and it is still going. I'll let it go overnight and see what happens. Thanks!

KnomDeGuerre 2008-10-22 20:57:45

As I mentioned in my post, try to avoid `<xsl:for-each>`. Borrowing from your example, it would be better to see `<xsl:template match="SomethingElse">`. I'll see if I can find a link to explain why.

trebormf 2008-10-22 21:18:04

Answer 5

+1 A:

One thing world checking is if your XSLT does lookups into other parts of the XML document a lot, i.e. you are in one context node and lookup a value in another part of the document or even another document. If you are doing this it can hit performance quite hard and you should consider using xsl:key and the key function for this. It tells the processor to implement fast lookup index on the data in question.

I was once building an XSLT that took 8 hours to run (with lots of cross references) and switching to use keys gave it a massive speed boost.

Matt Large 2008-10-23 10:38:53

Thanks! I don't have any xsl:key in my xsl.

KnomDeGuerre 2008-10-24 22:57:15

Answer 6

+1 A:

Upon looking up your problem, I found a KB at Microsoft about that. You can see it here.

They say that the XSLT transform in .NET 1 has some issues with the performance and that they can offer a quick fix.

If you want to try to troubleshoot the problem, there is a XSLT profiler available here.

Otherwise, you can see what links is given on the Microsoft website for optimizing speed issues with XSLT (link).

Maxim 2008-10-23 10:48:44

Thanks! No xsl:key in my transform though.

KnomDeGuerre 2008-10-24 22:58:14

Answer 7

A:

An update of activities I have done since:

To Treb - those two sample pieces of code were already inside of a template which, for the huge source file in question, will only get called once.

Meanwhile, since everything I need from inside the source file is under the //DataSet/Data section (of which there is only one), I wrapped everything inside a

<xsl:template match="DataSet/Data">

wrapper. The template mentioned earlier gets called by a 'if' check in the main section. I couldn't figure out how to get the called template to work with this new wrapper, so got rid of it entirely and incorporated its fairly simple guts into the 'if' section directly.

The result is still extremely slow, so I am tempted to accept the earlier answer from Treb because of the "all bets are off" part.

KnomDeGuerre 2008-10-24 22:54:59

I've added an additional troubleshooting suggestion to my original answer in hopes that it will help you. Please take a look.

trebormf 2008-10-29 18:50:52

Answer 8

A:

To anyone who would like more details to work with ... I can't post exact samples for propriatery reasons, but here is a more detailed description of what I am trying to accomplish:

The source file is like a flat table of values. I need to transform it into a sectioned or outlined table, where each section begins when one specific column value in the flat table changes. For example, suppose the flat table contents are:

The output section headers should be floated up from the first column of the flat table. So the transformed output would be like this:

etc

To detect when to start a new section, I did this:

<xsl:if test="@TheFirstCol>preceding-sibling::*[1]/@TheFirstCol">

Could this be causing a lot or re-iteration? If so, any less iterative way to compare the value of an attribute in the current node against the value from the previous node?

Thanks!

KnomDeGuerre 2008-10-24 23:13:08

The <xsl:if> test above looks fine. It's quite normal to "look back" at previous values.Sorry for my slow response (I'm traveling).

trebormf 2008-10-29 17:07:03

Answer 9

+3 A:

> To detect when to start a new section, I did this:

>

> <xsl:if test="@TheFirstCol>preceding-sibling::*[1]/@TheFirstCol"

> Could this be causing a lot or re-iteration?

Definitely. The algorithm you've chosen is O(N^2) and would be very slow with sufficient number of siblings, regardless of the implementation language.

Here is an efficient algorithm using keys:

Solution1:

`<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="text"/>

<xsl:key name="kC1Value" match="@c1" use="."/>

<xsl:template match="/">
  <xsl:for-each select=
  "*/x[generate-id(@c1) = generate-id(key('kC1Value',@c1)[1])]">

   <xsl:value-of select="concat('&#xA;',@c1)"/>

   <xsl:for-each select="key('kC1Value',@c1)">
     <xsl:value-of select="'&#xA;'"/>
     <xsl:for-each select="../@*[not(name()='c1')]">
       <xsl:value-of select="concat('   ', .)"/>
     </xsl:for-each>
   </xsl:for-each>
  </xsl:for-each>
</xsl:template>

`

Unfortunately, XslTransform (.NEt 1.1) has a notoriously inefficient implementation of the generate-id() function.

The following may be faster with XslTransform:

Solution2:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="text"/>

<xsl:key name="kC1Value" match="@c1" use="."/>

<xsl:template match="/">
  <xsl:for-each select=
  "*/x[count(@c1 | key('kC1Value',@c1)[1]) = 1]">

   <xsl:value-of select="concat('&#xA;',@c1)"/>

   <xsl:for-each select="key('kC1Value',@c1)">
     <xsl:value-of select="'&#xA;'"/>
     <xsl:for-each select="../@*[not(name()='c1')]">
       <xsl:value-of select="concat('   ', .)"/>
     </xsl:for-each>
   </xsl:for-each>
  </xsl:for-each>
</xsl:template>

When applied on the following small xml document:

<t>

</t>

both solutions produced the wanted result:

1

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

2

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

3

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

4

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

5

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

0 0 0 0

0 1 0 0

6

2 0 0 0

1 1 0 0

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

7

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

8

0 0 0 0

0 1 0 0

2 0 0 0

1 1 0 0

From the above small xml file I generated a 10MB xml file by copying every element 6250 times (using another XSLT transformation :) ).

With the 10MB xml file and with XslCompiledTransform (.Net 2.0 + ) the two solutions had the following transformation times:

Solution 1: 3.3sec.

Solution2: 2.8sec.

With XslTransform (.Net 1.1) Solution2 run for 1622sec., that is about 27 minutes.

Hope this helped.

Cheers,

Dimitre Novatchev

Dimitre Novatchev 2008-11-10 23:01:36

Yes, very helpful. Thank you, Dimitre!

KnomDeGuerre 2008-11-11 21:35:27

I just saw that the <xsl:stylesheet> and <xsl:key> instructions were not being displayed -- this is now corrected.

Dimitre Novatchev 2008-11-11 22:53:44

ansaurus

tags:

views:

answers:

XSLT Transform Efficiency

Solution1:

Solution2:

related questions