tags:

views:

330

answers:

3

Apologies if this is a very simple question; I don't use XSLT very much and I can't find much advice on the web, as there is lots of pollution in search results!

I have an XML document in the following form. Its main purpose is to be reformatted in a few ways by XSLT for display in a couple of different formats.

<desk>
<drawer>
    <contents>pencils</contents>
    <quantity>2</quantity>
</drawer>
<drawer>
    <contents>pens</contents>
    <quantity>15</quantity>
</drawer>
<drawer>
    <contents>pencils</contents>
    <quantity>3</quantity>
</drawer>
<drawer>
    <contents>rulers</contents>
    <quantity>2</quantity>
</drawer>
</desk>

I'd like to extract from the xml two pieces of information: i) the average quantity; ii) the most frequently encountered content by number of appearances in the xml (i.e. "pencils" because it appears twice rather than "pens" because it has the largest quantity). The idea is that this can be piped into a very simple shell script. I therefore thought that the easiest way of getting this information would be to write couple of short xsl style-sheets and then use xsltproc to get the data.

The first piece of information seems straight-forward. The heart of the style-sheet would be this line:

<xsl:value-of select="(sum(drawer/quantity)) div (count(drawer))" />

but I'm a bit stuck by the second.

I think I can use something like this for loop through a list of each individual content:

<xsl:for-each select="drawer[not(contents = preceding-sibling::drawer/contents)]" />

but I'm not quite sure how then to count the number of elements which have $current_contents and the value of their content element. Nor can I see an easy way of then sorting by results so I can get the name of the most frequently encountered value of contents.

I have a feeling this is easier in XSLT 2.0 with its various group-by options, but unfortunately, xsltproc doesn't seem to support that. Any help would be gratefully received.

Many thanks,

Jacob

A: 

Sorting in the for-each is done via sort element. Just sort by the quantity and (if you only want the most frequent) add a <xsl:if test="position()=1"> tag to only get the first in the loop.

<xsl:for-each select="drawer">
   <xsl:sort select="quantity" data-type="number" order="descending"/>
   <xsl:if test="position()=1">
      Most frequent: <xsl:value-of select="contents"> with <xsl:value-of select="quantity"> items
   </xsl:if>
</xsl:for-each>
Lucero
Ah, sorry, should have explained myself more clearly. This will produce the result "pencil" because there are 17 pencils. What I would like is for it to produce "pencil" because "pencil" appears twice and "pen" and "ruler" appears once.
Jacob Head
+1  A: 

As with a great many problems solved in XSLT, I think your answer here is muenchian grouping. Group by whatever data you're interested in, a for-each against that will let you use xsl:sort and then do whatever you need to with the first result.

Untested, top-of-head, might-be-a-cleaner-way code:

<xsl:key name="average" match="desk/drawer/contents" use="text()"/>

<xsl:template match="/">
 <xsl:for-each select="desk/drawer/contents[generate-id() = generate-id(key('average',text())[1])]">  
  <xsl:sort select="count(//desk/drawer/contents[text()=current()])"  order="descending"/>
  <xsl:if test="position()=1">
   Most common value: "<xsl:value-of select="current()"/>" (<xsl:value-of select="count(//desk/drawer/contents[text()=current()])"/>)
  </xsl:if>  
 </xsl:for-each>
</xsl:template>
annakata
Thanks; that's helpful. What I can't work out how to do, though is the "whatever you need" bit. Having grouped the xml by "contents" is there an easy way of counting how many times a particularly "contents" value appears in the xml?
Jacob Head
sorry, see update
annakata
Many thanks; that's really helpful and works!Now I just need to work out how it works! :)
Jacob Head
Muenchian technique groups all the possible types so you're limiting the loop to the number of possible candidates (where you anticipate a low amount of repetition or a low number of candidates it may be counter-productive). The for-each merely allows you to use a sort which is simply based on the count of each distinct value. The if just crops the output, sadly there's no break equivalent to XSLT :)
annakata
A: 

It's been a while, but I think something along these lines might work.

First count all contents

<xsl:variable name="tally">
  <xsl:for-each select="drawer">
     <contents count="{count(drawer[contents = current()/contents])}"><xsl:value-of select="contents"/></contents>
  </xsl:for-each>
</xsl:variable>

Note that the duplicated entries are counted each time, $tally would contain:

<contents count="2">pencils</contents>
<contents count="1">pens</contents>
<contents count="2">pencils</contents>
<contents count="1">rulers</contents>

Then use this to find one for which there is no other with a higher count:

<xsl:variable name="mostfrequentcontents" select="$tally/contents[not($tally/contents/@count > @count)]" />

Depending on your xslt processor you might have to convert $tally to a nodeset using a node-set function.

Mario Menger
Thanks; I can never get my head around xslt to think up these ways of doing things!Using xsltproc, I get the following, so I think I need to use node-set. I'm consulting the manual now... XPath error : Invalid type runtime error: file port.xsl line 11 element variable Failed to evaluate the expression of variable 'mostfrequentcontents'.
Jacob Head