ansaurus

Question

Answer 1

+1 A:

I'd guess the problem is the generation of text-id, i.e. the expression

generate-id(
    $main-root/descendant::text()[
      sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
    ][1]
  )

You are potentially recalculating a lot of sums here. I think the easiest path here would be to invert your approach: recurse across the text nodes in the document, aggregate the string length so far, and output data elements each time a new @index is reached. The following example illustrates the approach. Note that each unique @index and each text node is visited only once.

<xsl:variable name="insert-doc" select="doc($insert-file)"/>

<xsl:variable name="insert-data" as="element(data)*"> 
    <xsl:call-template name="calculate-data"/>
</xsl:variable>

<xsl:key name="index" match="data" use="xsd:integer(@index)"/>

<xsl:template name="calculate-data">
    <xsl:param name="text-nodes" select="$main-root//text()"/>
    <xsl:param name="previous-lengths" select="0"/>
    <xsl:param name="indexes" as="xsd:integer*">
        <xsl:perform-sort 
            select="distinct-values(
                    $insert-doc/insert-data/data/@index/xsd:integer(.))">
            <xsl:sort/>
        </xsl:perform-sort>
    </xsl:param>
    <xsl:if test="$text-nodes">
        <xsl:variable name="total-lengths" 
            select="$previous-lengths + string-length($text-nodes[1])"/>
        <xsl:choose>
            <xsl:when test="$total-lengths ge number($indexes[1])">
                <data 
                    index="{$indexes[1]}" 
                    text-id="{generate-id($text-nodes[1])}">
                    <xsl:copy-of select="key('index', $indexes[1], 
                                             $insert-doc)"/> 
                </data>
                <!-- Recursively move to the next index. -->
                <xsl:call-template name="calculate-data">
                    <xsl:with-param
                        name="text-nodes"
                        select="$text-nodes"/>
                    <xsl:with-param
                        name="previous-lengths" 
                        select="$previous-lengths"/>
                    <xsl:with-param
                        name="indexes" 
                        select="subsequence($indexes, 2)"/>
                </xsl:call-template>                    
            </xsl:when>
            <xsl:otherwise>
                <!-- Recursively move to the text node. -->
                <xsl:call-template name="calculate-data">
                    <xsl:with-param 
                        name="text-nodes" 
                        select="subsequence($text-nodes, 2)"/>
                    <xsl:with-param
                        name="previous-lengths" 
                        select="$total-lengths"/>
                    <xsl:with-param 
                        name="indexes" 
                        select="$indexes"/>
                </xsl:call-template>                    
            </xsl:otherwise>
        </xsl:choose>
    </xsl:if>
</xsl:template>

markusk 2010-05-13 20:20:36

Thanks for your reposnse. Will try this and update soon.

Rachel 2010-05-13 20:29:29

Great, the reponse is in few ms. It has got reduced from 82242 to 813. Thanks a lot!! The value for the "data" node alone does not come in the result. Line: <xsl:copy-of select="key('index', $indexes[1], $insert-doc)"/> which is getting read from, <xsl:key name="index" match="data" use="@index"/>.

Rachel 2010-05-13 20:54:13

@Rachel: Good to hear that your response time improved. There was a bug in my original key definition, you need to use `<xsl:key name="index" match="data" use="xsd:integer(@index)"/>` to get correct results.

markusk 2010-05-13 21:07:16

It works now. Thanks.

Rachel 2010-05-13 21:13:10

In my case the index value may not be unique i.e. $insert-doc/insert-data/data/@index. In this case there are multiple data tags created for the same index. How can this be resolved? Pls give your inputs.

Rachel 2010-05-13 21:29:37

@Rachel: By using `distinct-values`. I edited my answer to use this function, see the new definition for the parameter `indexes`. Actually, I made this update to my answer almost an hour ago, but I guess you got the old version. :-)

markusk 2010-05-13 21:47:01

oh yes. I did not notice.

Rachel 2010-05-13 22:04:26

@Rachel: No problem. I hope `distinct-values` solved the issue?

markusk 2010-05-13 22:17:34

Your solution works perfectly with distinct-values. I slightly modified the XSL to incorporate an additional condition required for me. Adding this caused reduction in performace i.e increasing the response time by twice. How can this be re-written efficiently. I have edited by question and added a sample code with the new condition included.

Rachel 2010-05-13 22:17:41

@Rachel: If you want to test whether the current index node contains an element named "end", you can write `<xsl:when test="$indexes[1]/end">` instead of `<xsl:when test="contains($indexes[1]/node()/name() , 'end')">`. Does that improve your performance?

markusk 2010-05-14 06:58:41

The condition is not getting satisfied. I have namespace prefix pre:end. Is it because of a namespace prefix?

Rachel 2010-05-14 15:16:28

ansaurus

tags:

views:

answers:

Improving the performance of XSL

related questions