views:

42

answers:

2

I’m processing an XML file that, simplified, looks something like this:

<resources>
  <resource id="a">
    <dependency idref="b"/>
    <!-- some other stuff -->
  </resource>
  <resource id="b">
    <!-- some other stuff -->
  </resource>
</resources>

The XSLT stylesheet must process a particular resource that we’re interested in, which I will call the root resource, and all recursive dependencies. Dependencies are other resources, uniquely identified by their id attribute.

It doesn’t matter if a resource is processed twice, although it’s preferable to process each required resource only once. It also doesn’t matter what order the resources are processed in.

It’s important that only the root resource and its recursive dependencies are processed. We can’t just process all the resources and be done with it.

A naïve implementation is as follows:

<xsl:key name="resource-id" match="resource" use="@id"/>

<xsl:template match="resource">
  <!-- do whatever is required to process the resource. -->

  <!-- then handle any dependencies -->
  <xsl:apply-templates select="key('resource-id', dependency/@idref)"/>
</xsl:template>

This implementation works fine for the example above, as well as in many real-world cases. It does have the disadvantage that it often processes the same resource more than once, but as stated above that’s not hugely important.

The problem is that sometimes resources have cyclic dependencies:

<resources>
  <resource id="a">
    <dependency idref="b"/>
    <dependency idref="d"/>
  </resource>
  <resource id="b">
    <dependency idref="c"/>
  </resource>
  <resource id="c">
    <dependency idref="a"/>
  </resource>
  <resource id="d"/>
</resources>

If you use the naïve implementation to process this example, and you start by processing a, b or c, you get infinite recursion.

Unfortunately I can’t control the input data and in any case cyclic dependencies are perfectly valid and allowed by the relevant specification.

I’ve come up with various partial solutions, but nothing that works in all cases.

The ideal solution would be a general approach to preventing a node from being processed more than once, but I don’t think that’s possible. In fact, I suspect this whole problem is impossible to solve.

If it helps, I have most of EXSLT available (including functions). If necessary I can also pre-process the input with any number of other XSLT scripts, although it’s preferable not to do excessive pre-processing of resources that won’t end up in the output.

What I can’t do is switch to processing this with another language (at least not without substantial re-engineering). I also can’t use XSLT 2.0.

Any ideas?

+1  A: 

This is a simple solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:param name="pRootResourceId" select="'a'"/>

 <xsl:key name="kResById" match="resource" use="@id"/>

 <xsl:template match="/">
  <resourceProcessing root="{$pRootResourceId}">
    <xsl:apply-templates select=
    "key('kResById', $pRootResourceId)"/>
  </resourceProcessing>
 </xsl:template>

 <xsl:template match="resource">
  <xsl:param name="pVisited" select="'|'"/>

  <xsl:copy>
    <xsl:copy-of select="@*"/>

    <xsl:apply-templates select=
      "key('kResById',
           dependency/@idref
                [not(contains($pVisited, concat('|', ., '|')))])">
      <xsl:with-param name="pVisited"
       select="concat($pVisited, @id, '|')"/>
    </xsl:apply-templates>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

When applied on the provided XML document:

<resources>
  <resource id="a">
    <dependency idref="b"/>
    <dependency idref="d"/>
  </resource>
  <resource id="b">
    <dependency idref="c"/>
  </resource>
  <resource id="c">
    <dependency idref="a"/>
  </resource>
  <resource id="d"/>
</resources>

the wanted, correct result is produced:

<resourceProcessing root="a">
   <resource id="a">
      <resource id="b">
         <resource id="c"/>
      </resource>
      <resource id="d"/>
   </resource>
</resourceProcessing>

The main idea is simple: Maintain a list of ids of visited resources and only allow the processing of a new resource if its id is not present in the list. The "processing" is for demonstration purposes and outputs the request wrapping all other requests (recursively), on which it depends.

Also note that every request is processed only once.

Years ago I provided a similar solution to a graph-traversal problem -- it can be found in the xml-dev group archives -- here. :)

Dimitre Novatchev
Brilliant! Thanks. As soon as I asked an XSLT question I had a feeling you’d answer it :). By the looks of it, it should be possible to maintain the list of visited resources as a node-set instead of a string, which might be a bit clearer. Thanks for pointing me in the right direction.
Daniel Cassidy
@Daniel-Cassidy: I don't recommend to maintain nodes in the pVisited variable, because re-copying everytime a new node has to be added is much more slow than simply concatenating a string.In XPath 2.0 either pre-pending or appending to a sequencecan be optimized by the XSLT processor (for example Saxon optimizes appending), and this can be used to improve this algorithm from O(N^2) to O(N).
Dimitre Novatchev
Fair enough. By the way, your solution doesn’t guarantee that every `resource` is processed only once, for example in the case that __a__ depends on __b__ and __c__, and __b__ depends on __c__. In this case, __c__ gets processed twice. However as I said that isn’t a problem for me; your solution should prevent cycles, which is all that is required.
Daniel Cassidy
@Daniel-Cassidy: I can make the solution process each resource only once -- for this I will change the transformation to use a node-by-node traversal, where the `<xsl:apply-templates>` instruction is always applied only on one (the "next") node.
Dimitre Novatchev
@Dimitre You’re right of course. I only mentioned it because of the line towards the end of your answer saying “Also note that every `request` [did you mean `resource`?] is processed only once.”
Daniel Cassidy
+1  A: 

Just for fun, another solution (following Dimitre) but increasing a node-set with visited nodes. I post two stylesheet, one with node set logic and other with node set comparison, because you must test wich is faster for big XML inputs.

So, this stylesheet:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:param name="pRootResourceId" select="'a'"/>
    <xsl:key name="kResById" match="resource" use="@id"/>
    <xsl:template match="/" name="resource">
        <xsl:param name="pVisited" select="key('kResById', $pRootResourceId)"/>
        <xsl:param name="pNew" select="key('kResById',$pVisited/dependency/@idref)"/>
        <xsl:choose>
            <xsl:when test="$pNew">
                <xsl:call-template name="resource">
                    <xsl:with-param name="pVisited" select="$pVisited|$pNew"/>
                    <xsl:with-param name="pNew" select="key('kResById',
           $pNew/dependency/@idref)[not(@id=($pVisited|$pNew)/@id)]"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <result>
                    <xsl:copy-of select="$pVisited"/>
                </result>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

And this stylesheet:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:param name="pRootResourceId" select="'a'"/>
    <xsl:key name="kResById" match="resource" use="@id"/>
    <xsl:template match="/" name="resource">
        <xsl:param name="pVisited" select="key('kResById', $pRootResourceId)"/>
        <xsl:param name="pNew" select="key('kResById', $pVisited/dependency/@idref)"/>
        <xsl:variable name="vAll" select="$pVisited|$pNew"/>
        <xsl:choose>
            <xsl:when test="$pNew">
                <xsl:call-template name="resource">
                    <xsl:with-param name="pVisited" select="$vAll"/>
                    <xsl:with-param name="pNew" select="key('kResById',
           $pNew/dependency/@idref)[count(.|$vAll)>count($vAll)]"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <result>
                    <xsl:copy-of select="$pVisited"/>
                </result>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

Both output:

(Wiht first input)

<result>
    <resource id="a">
        <dependency idref="b" />
        <!-- some other stuff -->
    </resource>
    <resource id="b">
        <!-- some other stuff -->
    </resource>
</result>

(With last input)

<result>
    <resource id="a">
        <dependency idref="b" />
        <dependency idref="d" />
    </resource>
    <resource id="b">
        <dependency idref="c" />
    </resource>
    <resource id="c">
        <dependency idref="a" />
    </resource>
    <resource id="d" />
</result>
Alejandro