views:

73

answers:

3

In continuation of the question I had asked concerning "How to create subsets of a single set of elements with XSLT?"

I wish to take my problem one step further: I had originally given the following XML as the original:

<Set>
   <Element name="Superset1_Set1_Element1"/>
   <Element name="Superset1_Set1_Element2"/>
   <Element name="Superset1_Set2_Element1"/>
   <Element name="Superset2_Set1_Element1"/>
   <Element name="Superset2_Set2_Element1"/>
</Set>

And had asked for the XSL Transformation to produce the following output:

<Superset name="Superset1">
   <Set name="Set1">
       <Element name="Element1"/>
       <Element name="Element2"/>
   </Set>
   <Set name="Set2">
       <Element name="Element1"/>
   </Set>
</Superset>
<Superset name="Superset2">
   <Set name="Set1">
       <Element name="Element1"/>
   </Set>
   <Set name="Set2">
       <Element name="Element1"/>
   </Set>
</Superset>

Both Tomalak and annakata had given me a working solution. I had chosen Tomalak's due to it's use of templates which is, in my opinion more human readable.

The problem is that my XML is actually of the form:

<Set>
   <Element name="Classic_Authors_Dante_Alighieri_The_Divine_Comedy"/>
   <Element name="Classic_Authors_Dante_Alighieri_Convivio"/>
   <Element name="Classic_Authors_Miguel_de_Cervantes_Saavedra_Don_Quixote"/>
   <Element name="Contemporary_Authors_Stephen_King_Just_After_Sunset"/>
   <Element name="Contemporary_Authors_Stephen_King_Desperation"/>
</Set>

Supersets, sets and elements have varying amounts of underscores within them. In the example above There are two supersets: 'Classic_Authors' and 'Contemporary_Authors'. The three sets are 'Dante_Alighieri', 'Miguel_de_Cervantes_Saavedra' and 'Stephen_King'.

The output XML should then be:

<Superset name="Classic_Authors">
   <Set name="Dante_Alighieri">
       <Element name="The_Divine_Comedy"/>
       <Element name="Convivio"/>
   </Set>
   <Set name="Miguel_de_Cervantes_Saavedra">
       <Element name="Don_Quixote"/>
   </Set>
</Superset>
<Superset name="Contemporary_Authors">
   <Set name="Stephen_King">
       <Element name="Just_After_Sunset"/>
       <Element name="Desperation"/>
   </Set>
</Superset>

How then, can I use Tomalak's solution? That is, how should I prepare my xml to use his algorithm? Can it be done in a single XSLT? Might there be another solution?

Thanks all very much!

A: 

The problem is that all of the information about the element is crammed into one attribute. You should separate the semantically different parts of your data either into seperate elements or separate attributes, i.e.:

<Set>
    <Element title="The Divine Comedy" author="Dante Alighieri" category="Classic Authors"/>
    ...

If you are stuck with the existing elements, I'm afraid I don't have a good solution. It is even hard for me, as a human being, to determine what parts of the "name" are titles, authors, or categories. I can't think of an easy way to parse out the data.

Matt Bridges
Lets presume that I have the information about which supersets and which sets exist documented within the company. How should I proceed?
Yaneeve
And yes I am stuck with the existing elements...
Yaneeve
This is a tough one. You would have to modify the xsl:key elements to know about the different categories and subcategories, instead of just splitting on the underscores. I'll have a look at it later when I have some more time.
Matt Bridges
Thanks! I appreciate that :)
Yaneeve
A: 

There is no deterministic way to separate the book name from the author name. The number of underscores in each varies.

The only solution is to add more information to your input by having the author of the sender change the format somehow. (perhaps two underscores between book and author?)

lavinio
As I told Matt, despite the fact that there is no deterministic way to separate book name for author etc. All the category names etc. are documented in the company. Also, I cannot change the format since it belongs to legacy systems which we are currently in the process of spiffing up. The above is only an example and is meant to contain names recognizable to normal human beings. The real names are business logic related and belong to a very finite sets.
Yaneeve
+1  A: 

As I said in the comments to my answer in your previous question, you'll need a file that contains the fixed and known set names before you can begin to solve this. Ideally, it is structured, like this:

<!-- SetNames.xml --->
<names>
  <Superset name="Classic_Authors">
    <Set name="Dante_Alighieri" />
    <Set name="Miguel_de_Cervantes_Saavedra" />
  </Superset>
  <Superset name="Contemporary_Authors">
    <Set name="Stephen_King" />
  </Superset>
</names>

Without such a file the problem will not be solvable. Now that you have nice structured set of names, you can do the grouping based on it (essentially, it is already in the output format, all you need to do is match your data against it):

<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:param name="pSetFile" select="'SetNames.xml'" />
  <xsl:variable name="root" select="/" />

  <xsl:template match="/Set">
    <xsl:copy>
      <xsl:variable name="vSetDoc" select="document($pSetFile)" />
      <xsl:apply-templates select="$vSetDoc/names/Superset">
        <xsl:sort select="@name" />
      </xsl:apply-templates>
    </xsl:copy> 
  </xsl:template>

  <xsl:template match="Superset">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates select="Set">
        <xsl:sort select="@name" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Set">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:variable name="vPrefix" select="
        concat(../@name, '_', @name, '_')
      " />
      <xsl:apply-templates select="
        $root/Set/Element[starts-with(@name, $vPrefix)]
      ">
        <xsl:sort select="@name" />
        <xsl:with-param name="pPrefix" select="$vPrefix" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Element">
    <xsl:param name="pPrefix" select="''" />

    <xsl:copy>
      <xsl:attribute name="name">
        <xsl:value-of select="substring-after(@name, $pPrefix)" />
      </xsl:attribute>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

When applied to your input, this produces:

<Set>
  <Superset name="Classic_Authors">
    <Set name="Dante_Alighieri">
      <Element name="Convivio" />
      <Element name="The_Divine_Comedy" />
    </Set>
    <Set name="Miguel_de_Cervantes_Saavedra">
      <Element name="Don_Quixote" />
    </Set>
  </Superset>
  <Superset name="Contemporary_Authors">
    <Set name="Stephen_King">
      <Element name="Desperation" />
      <Element name="Just_After_Sunset" />
    </Set>
  </Superset>
</Set>

Since SetNames.xml basically is already grouped, further (Muenchian) grouping will not be necessary. The slowest expression in the above will be this:

$root/Set/Element[starts-with(@name, $vPrefix)]

This "table scan" type of expression is exactly where an <xsl:key> would be helpful, but due to the nature of the problem it can't be used here.

Tomalak
@Yaneeve: If there is no possibility to generate a strucured SetNames.xml file, you still can use a flat list of atomic names to split your merged names into separate attributes. Which then can be used for grouping.
Tomalak
Thanks :) Fortunately, though, we have been able to create a SetNames.xml file as you have specified. I guess this stage of the problem is finally over :) Thanks again!
Yaneeve