views:

58

answers:

2

I am having some problems with an XML transform and need some help.

The stylesheet should iterate through all suffix elements and place the contents without the suffix tag next to the last text node within its first ancestor quote-block element (see desired ouput). It works when only a single suffix is present, but not when 2 are present, when 2 are present it places both suffixes next to each other in the last text node of the first quote-block.

Any ideas? I have tried limiting the selections to ancestor::quote-block[1] in various places but that doesn't have the desired effect.

Source XML

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’</quote-para>
                            <suffix>(Emphasis added.)</suffix>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”</para>
                </item>
            </list>
            <suffix>(emphasis in original)</suffix>
        </quote-block>
    </para>
</paragraph>

Stylesheet

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://xml.sm.com/schema/cases/report"
    xmlns:sm="http://xml.sm.com/functions" xmlns:saxon="http://saxon.sf.net/"
    xpath-default-namespace="http://sm.com/schema/cases/report"
    exclude-result-prefixes="xs sm" version="2.0">

    <xsl:output method="xml" indent="no"/>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>

    <!-- Match quote-blocks with open or close attributes. -->
    <xsl:template match="*[*:quote-block and descendant::*:suffix]">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Match inline quote with open or close attributes -->
    <xsl:template match="*[*:quote and descendant::*:suffix]">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Process the quote block -->
    <xsl:template name="process-quote-block">
        <xsl:variable name="quoteBlockCopy">
            <xsl:copy-of select="."/>
        </xsl:variable>

        <xsl:apply-templates select="$quoteBlockCopy" mode="append-suffix">
            <xsl:with-param name="suffix" select="sm:get-suffix-note(.)"/>
            <xsl:with-param name="end-node" select="sm:get-last-text-node($quoteBlockCopy)"/>
        </xsl:apply-templates>
    </xsl:template>

    <!-- Match quote-blocks with open or close attributes. -->
    <xsl:template match="*[*:quote-block and descendant::*:suffix][ancestor::*:quote-block[1]]" mode="create-copy">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Match inline quote with open or close attributes -->
    <xsl:template match="*[*:quote and descendant::*:suffix]" mode="create-copy">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- This will match all elements. Just copy and pass through the parameters. -->
    <xsl:template match="*" mode="append-suffix">
        <xsl:param name="suffix"/>
        <xsl:param name="end-node"/>
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates mode="append-suffix">
                <xsl:with-param name="suffix" select="$suffix"/>
                <xsl:with-param name="end-node" select="$end-node"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

    <!-- Apply the text node to the content. If the node is equal to the last node then append the descendants of suffix  -->
    <xsl:template match="text()[normalize-space() != '']" mode="append-suffix">
        <xsl:param name="suffix"/>
        <xsl:param name="end-node"/>
        <xsl:choose>
            <xsl:when test="count(. | $end-node) = 1">
                <xsl:value-of select="."/>
                <xsl:apply-templates select="$suffix"/>
            </xsl:when>
            <xsl:otherwise>
                <!-- Or maybe neither. -->
                <xsl:value-of select="."/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <!--  Dont copy suffix as -->
    <xsl:template match="*:suffix" mode="append-suffix"/>

    <xsl:function name="sm:get-suffix-note">
        <xsl:param name="node"/>
        <xsl:sequence select="$node/descendant::*:suffix/node()"/>
    </xsl:function>

    <xsl:function name="sm:get-last-text-node">
        <!--  Finds last non-empty text() node, ignoring <suffix> elements that are a child of this specific quote-block. -->
        <xsl:param name="node"/>

        <xsl:sequence
            select="reverse($node//text()[not(ancestor::*:suffix) and normalize-space() != ''])[1]"/>
    </xsl:function>

</xsl:stylesheet>

Current Output XML

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’</quote-para>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”(Emphasis
                        added.)(emphasis in original)</para>
                </item>
            </list>

        </quote-block>
    </para>
</paragraph>

Desired Ouput

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’(Emphasis
                                added.)</quote-para>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”(emphasis in original)</para>
                </item>
            </list>

        </quote-block>
    </para>
</paragraph>
+1  A: 

Man, you've dug yourself into quite a hole here. ;-) Here is what I have come up with:

<xsl:stylesheet 
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="xml" encoding="utf-8" indent="no"/>

  <!-- key to identify all non-empty, non-suffix text node descendants of
       a quote-block. We'll use that to pull out the "last one" later-on -->
  <xsl:key 
    name ="kQbText" 
    match="quote-block//text()[not(normalize-space() = '' or parent::suffix)]"
    use  ="generate-id(ancestor::quote-block[1])"
  />

  <!-- identity template to copy everything that is not otherwise handled -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <!-- special handling for text nodes that are descendants of quote-blocks -->
  <xsl:template match="quote-block//text()[not(normalize-space() = '' or parent::suffix)]">
    <xsl:variable name="qb" select="ancestor::quote-block[1]" />

    <!-- the text node gets copied regardless -->
    <xsl:copy-of select="." />

    <!-- if it is the last non-empty text node, append all suffices -->
    <xsl:if test="
      generate-id() 
      = 
      generate-id( key('kQbText', generate-id($qb))[last()] )
    ">
      <xsl:for-each select="$qb/suffix">
        <xsl:value-of select="concat(' ', .)" />
      </xsl:for-each>
    </xsl:if>
  </xsl:template>

  <!-- empty text nodes will be removed (all others are copied) -->
  <xsl:template match="text()[normalize-space() = '']" />

  <!-- suffix nodes will be deleted-->
  <xsl:template match="suffix" />

</xsl:stylesheet>

The above results in (indentation and line-breaks added with tidy to make it readable):

<paragraph>
  <para>
    <quote-block>
      <list prefix-rules="specified">
        <item prefix="“B42">
          <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June
          2000, EME and EWS reached an agreement to negotiate
          towards a direct contract for coal haulage by rail (on a
          DIY basis), which would replace the previous indirect E2E
          arrangements that EME had in place with ECSL. An internal
          EWS e-mail noted: 
          <quote-block>
            <quote-para>‘We did the deal with Edison Mission
            yesterday morning for LBT-Fiddlers @ £[…]/tonne as
            agreed. This rate until 16th September pending a
            contract.</quote-para>
            <quote-para>
            <emphasis strength="strong">Enron are now off our hands
            so far as Edison are concerned. The Enron flows we have
            left are to British Energy’s station at Eggborough;
            from Immingham, Redcar and Hull</emphasis>. Also to
            Enron’s own power station at Wilton – 250,000
            tonnes/year. I think we are stuck Enron [sic] on the
            Eggborough traffic until next April when British Energy
            will, hopefully take over their own coal procurement. 
            <emphasis strength="strong">But we have got them out of
            Fiddlers Ferry and Ferrybridge – a big step
            forward</emphasis>.’ (Emphasis added.)</quote-para>
          </quote-block></para>
        </item>
        <item prefix="B43">
          <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This
          e-mail is evidence of both EWS’s intent and, indeed, its
          success in stopping ECSL from carrying out indirect
          supplies to EME, one of the new generating companies.”
          (emphasis in original)</para>
        </item>
      </list>
    </quote-block>
  </para>
</paragraph>

The XSLT code here is XSLT 1.0, but you can run it unaltered in a 2.0 processor.

Tomalak
Hi Tomalak, appreciate the help, certainly helps me get back out of my hole ;0)
Mike
@Mike: Glad to help. Don't hesitate to ask if anything needs a more thorough explanation.
Tomalak
+1  A: 

Here is a simple transform that adresses just the problem. As others have noticed, the problem is specified in a very messy way and does not allow a single, unambiguous interpretation.

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

 <xsl:strip-space elements="*"/>

 <xsl:key name="kLastNonSufText"
   match="*[not(self::suffix)]/text()"
   use="generate-id(ancestor::quote-block[1])"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()[ancestor::quote-block]">
  <xsl:copy-of select="."/>

  <xsl:variable name="vQBImmed" select="ancestor::quote-block[1]"/>

  <xsl:variable name="vLastText" select=
   "key('kLastNonSufText', generate-id($vQBImmed))
      [last()]"/>

  <xsl:if test="count(.|$vLastText) = 1">
      <xsl:copy-of select="($vQBImmed//suffix)[last()]/text()"/>
  </xsl:if>
 </xsl:template>

 <xsl:template match="suffix"/>
</xsl:stylesheet>

When this transformation is applied on the (very unreadable and poorly formatted) provided source XML document:

<paragraph>
 <para>
  <quote-block>
    <list prefix-rules="specified">
        <item prefix="“B42">
            <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                reached an agreement to negotiate towards a direct contract for coal haulage
                by rail (on a DIY basis), which would replace the previous indirect E2E
                arrangements that EME had in place with ECSL. An internal EWS e-mail noted:
                <quote-block>
                    <quote-para>‘We did the deal with Edison Mission yesterday morning for
                        LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                        pending a contract.</quote-para>
                    <quote-para>
                        <emphasis strength="strong">Enron are now off our hands so
                            far as Edison are concerned. The Enron flows we have left are to
                            British Energy’s station at Eggborough; from Immingham, Redcar
                            and Hull</emphasis>. Also to Enron’s own power station at Wilton
                        – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                        Eggborough traffic until next April when British Energy will,
                        hopefully take over their own coal procurement.
                        <emphasis
                            strength="strong">But we have got them out of Fiddlers Ferry and
                            Ferrybridge – a big step forward</emphasis>.’
                    </quote-para>
                    <suffix>(Emphasis added.)</suffix>
                </quote-block>
            </para>
        </item>
        <item prefix="B43">
            <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                indirect supplies to EME, one of the new generating companies.”</para>
        </item>
    </list>
    <suffix>(emphasis in original)</suffix>
  </quote-block>
 </para>
</paragraph>

the output has the desired suffixes appended to the desired text nodes:

<?xml version="1.0" encoding="UTF-16"?><paragraph><para><quote-block><list prefix-rules="specified"><item prefix="“B42"><para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                reached an agreement to negotiate towards a direct contract for coal haulage
                by rail (on a DIY basis), which would replace the previous indirect E2E
                arrangements that EME had in place with ECSL. An internal EWS e-mail noted:
                <quote-block><quote-para>‘We did the deal with Edison Mission yesterday morning for
                        LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                        pending a contract.</quote-para><quote-para><emphasis strength="strong">Enron are now off our hands so
                            far as Edison are concerned. The Enron flows we have left are to
                            British Energy’s station at Eggborough; from Immingham, Redcar
                            and Hull</emphasis>. Also to Enron’s own power station at Wilton
                        – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                        Eggborough traffic until next April when British Energy will,
                        hopefully take over their own coal procurement.
                        <emphasis strength="strong">But we have got them out of Fiddlers Ferry and
                            Ferrybridge – a big step forward</emphasis>.’
                    (Emphasis added.)</quote-para></quote-block></para></item><item prefix="B43"><para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                indirect supplies to EME, one of the new generating companies.”(emphasis in original)</para></item></list></quote-block></para></paragraph>
Dimitre Novatchev
Hi Dimitre, thanks for the help. Point noted on the XML.
Mike