tags:

views:

43

answers:

3

I have a large poorly formed XML file where information related to a single line item is broken into multiple lines of information that I'm trying to group with the parent line item (ITEM_ID). The information is sequential so the key is the ITEM_ID node, but I can't seem to create the proper XSL needed to group the information related to an item (ITEM_ID), given the following XML source (Updated to include newly discovered grandchild element in XML source):

<LINE_INFO>
    <ITEM_ID>some_part_num</ITEM_ID>
    <DESC>some_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
</LINE_INFO>
<LINE_INFO>
    <EXT_DESC>more_description_for_some_part_num</EXT_DESC>
</LINE_INFO>
<LINE_INFO>
    <ITEM_ID>some_other_part_num</ITEM_ID>
    <DESC>some_other_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
</LINE_INFO>
<LINE_INFO>
    <EXT_DESC>more_description_for_some_other_part_num</EXT_DESC>
</LINE_INFO>
<LINE_INFO>
    <LINE_NOTE>This is a note related to some_other_part_num</LINE_NOTE>
</LINE_INFO>
<LINE_INFO>
    <ADDTL_NOTE_DETAIL>
        <NOTE>This is the grandchild note that sometimes appears in my data</NOTE>
    </ADDTL_NOTE_DETAIL>
</LINE_INFO>
<LINE_INFO>
    <ITEM_ID>yet_another_part_num</ITEM_ID>
    <DESC>yet_another_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
</LINE_INFO>
  ...

Desired output:

<LINE_INFO>
    <ITEM_ID>some_part_num</ITEM_ID>
    <DESC>some_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
    <EXT_DESC>more_description_for_some_part_num</EXT_DESC>
</LINE_INFO>
<LINE_INFO>
    <ITEM_ID>some_other_part_num</ITEM_ID>
    <DESC>some_other_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
    <EXT_DESC>more_description_for_some_other_part_num</EXT_DESC>
    <LINE_NOTE>This is a note related to some_other_part_num</LINE_NOTE>
    <NOTE>This is the grandchild note that sometimes appears in my data</NOTE>
</LINE_INFO>
<LINE_INFO>
    <ITEM_ID>yet_another_part_num</ITEM_ID>
    <DESC>yet_another_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
</LINE_INFO>
+1  A: 

This is a classic grouping problem. The best approach depends on whether you have XSLT 2.0, or have to use 1.0.

If 2.0, you'll want to use <xsl:for-each-group>:

<table>
   <xsl:for-each-group select="LINE_INFO" group-starting-with="LINE_INFO[ITEM_ID]">

The above XPath expressions for select and group-starting-with assume that the context node is the parent of the LINE_INFO elements. Alternatively you could put // on the front of both expressions, at the risk of lesser performance.

Output a row for each group, with data put in table cell's according to your most recent comment:

      <tr>
         <td><xsl:value-of select="current-group()/ITEM_ID" /></td>
         <td>
           <xsl:value-of "concat(current-group()/DESC, current-group()/EXT_DESC)"/>
           <br />
           <xsl:value-of "concat(current-group()/LINE_NOTE)" />
           <br />
           <xsl:value-of "concat(current-group()/NOTE)" />
         </td>
         <td><xsl:value-of select="current-group()/QTY" /></td>
         <td><xsl:value-of select="current-group()/ADDTL_NOTE_DETAIL/NOTE" /></td>
      </tr>
   </xsl:for-each-group>
</table>

(The rest of this answer is somewhat obsolete as the OP has XSLT 2.0.)

If 1.0, your best bet is Muenchian grouping. For the identifying-the-groups step (step 1), you would use a key like

<xsl:key name="LINE_INFO-by-section" match="LINE_INFO"
    use="generate-id((. | preceding-sibling::LINE_INFO)[ITEM_ID][last()])" />

To iterate over the groups:

<xsl:for-each select="LINE_INFO[ITEM_ID]">
   <xsl:copy>

To iterate over the members of the group:

      <xsl:variable name="section-starter-id" select="generate-id(.)" />
      <xsl:for-each select="key('LINE_INFO-by-section', $section-starter-id))">
         <xsl:copy-of select="node()|@*" />
      </xsl:for-each>
   </xsl:copy>
</xsl:for-each>

(Untested.)

LarsH
LarsH, I'm using XSLT 2.0, but I am not really a programmer, so I'm struggling with getting the syntax of the for-each-group right.
Rey
@Rey: I elaborated a bit on the XSLT 2.0 part of the answer. Is that enough to go on? If you have further questions, let me know.
LarsH
LarsH, I thought I'd be clever by simplifying my problem, but I found I'm not able to implement your solutions. I've created a script to transform my XML document into HTML. The complete XML contains order header, lines and footer details, and my XSL worked great accept for the line item detail, where each LINE_INFO is getting created with a new TABLE row for each LINE_INFO group, and so what I need is to put the value-of each of the ITEM_ID related detail (nodes) into table data of an HTML table row. I'm sure this is a small tweak to your response, but I can't figure it out.
Rey
@Rey, maybe the piece you're missing is current-group(), which yields all the nodes (LINE_INFO elements) in the the current group. `select="current-group()/*"` selects all the child elements of the current group of LINE_INFO elements.
LarsH
Finally, it's starting to come together using the version 1.0 XLST above. However, I just found out that my XML source can also have a grandchild node that is also related to the item, but the code above is not selecting the grandchild when using xsl:value-of select="GRANDCHILD/NODE", so is there another piece of magic to select the grandchild node's value?
Rey
@Rey, I'm not clear on where the additional grandchild node appears in the tree... can you edit your XML sample to show it? And where should the grandchild node's value appear in the output -- in the table cell? Make sure the XPath in your `select` attribute is relative to the *child* of LINE_INFO, because the context node there is a child of LINE_INFO. E.g. `<xsl:value-of select="GRANDCHILD" />`
LarsH
I've updated my source. In the output table, the DESC, EXT_DESC, LINE_NOTE and NOTE would be in one cell of the table as this information is really all related to the items description. The format of this TD would have the DESC and EXT_DESC concatenated togehter, with the LINE_NOTE and NOTE following on additional lines separated by breaks (not rows). All of the information would fall in a single row as related to the ITEM_ID.
Rey
@Rey, I added the grandchild and adjusted cell formatting according to your description. I can't do more programming for you, but hopefully this gets you going in the right direction.
LarsH
I understand and appreciate all of your help. Thank you very much for your information, and also to the others that shared their input.
Rey
+2  A: 

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kFollowing" match="LINE_INFO[not(ITEM_ID)]"
 use="generate-id(preceding-sibling::LINE_INFO[ITEM_ID][1])"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="LINE_INFO[ITEM_ID]">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>

   <xsl:apply-templates select="key('kFollowing', generate-id())/node()"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="LINE_INFO[not(ITEM_ID)]"/>
</xsl:stylesheet>

when applied on the provided XML document (wrapped in a single top element to mane it well-formed):

<t>
    <LINE_INFO>
        <ITEM_ID>some_part_num</ITEM_ID>
        <DESC>some_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
    </LINE_INFO>
    <LINE_INFO>
        <EXT_DESC>more_description_for_some_part_num</EXT_DESC>
    </LINE_INFO>
    <LINE_INFO>
        <ITEM_ID>some_other_part_num</ITEM_ID>
        <DESC>some_other_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
    </LINE_INFO>
    <LINE_INFO>
        <EXT_DESC>more_description_for_some_other_part_num</EXT_DESC>
    </LINE_INFO>
    <LINE_INFO>
        <LINE_NOTE>This is a note related to some_other_part_num</LINE_NOTE>
    </LINE_INFO>
    <LINE_INFO>
        <ITEM_ID>yet_another_part_num</ITEM_ID>
        <DESC>yet_another_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
    </LINE_INFO>
</t>

produces the wanted, correct result:

<t>
    <LINE_INFO>
        <ITEM_ID>some_part_num</ITEM_ID>
        <DESC>some_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
        <EXT_DESC>more_description_for_some_part_num</EXT_DESC>
    </LINE_INFO>
    <LINE_INFO>
        <ITEM_ID>some_other_part_num</ITEM_ID>
        <DESC>some_other_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
        <EXT_DESC>more_description_for_some_other_part_num</EXT_DESC>
        <LINE_NOTE>This is a note related to some_other_part_num</LINE_NOTE>
    </LINE_INFO>
    <LINE_INFO>
        <ITEM_ID>yet_another_part_num</ITEM_ID>
        <DESC>yet_another_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
    </LINE_INFO>
</t>

Do note: The use of keys to identify easily and efficiently all LINE_INFO nodes that dont have an ITEM_ID child and immediately follow a LINE_INFO node with an ITEM_ID child.

Dimitre Novatchev
+1 good tested, working answer that works for all XSLT processors. Also I took your cue to copy all `node()|@*` instead of `*`.
LarsH
@Dimitre: +1 Good answer.
Alejandro
+1  A: 

This XSLT 2.0 stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output indent="yes"/>
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="root">
        <xsl:for-each-group select="LINE_INFO"
                            group-starting-with="LINE_INFO[ITEM_ID]">
            <xsl:copy>
                <xsl:apply-templates select="current-group()/node()"/>
            </xsl:copy>
        </xsl:for-each-group>
    </xsl:template>
</xsl:stylesheet>

With this input:

<root>
    <LINE_INFO>
        <ITEM_ID>some_part_num</ITEM_ID>
        <DESC>some_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
    </LINE_INFO>
    <LINE_INFO>
        <EXT_DESC>more_description_for_some_part_num</EXT_DESC>
    </LINE_INFO>
    <LINE_INFO>
        <ITEM_ID>some_other_part_num</ITEM_ID>
        <DESC>some_other_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
    </LINE_INFO>
    <LINE_INFO>
        <EXT_DESC>more_description_for_some_other_part_num</EXT_DESC>
    </LINE_INFO>
    <LINE_INFO>
        <LINE_NOTE>This is a note related to some_other_part_num</LINE_NOTE>
    </LINE_INFO>
    <LINE_INFO>
        <ITEM_ID>yet_another_part_num</ITEM_ID>
        <DESC>yet_another_part_num_description</DESC>
        <QTY>nn</QTY>
        <UNIT>uom</UNIT>
    </LINE_INFO>
</root>

Output:

<LINE_INFO>
    <ITEM_ID>some_part_num</ITEM_ID>
    <DESC>some_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
    <EXT_DESC>more_description_for_some_part_num</EXT_DESC>
</LINE_INFO>
<LINE_INFO>
    <ITEM_ID>some_other_part_num</ITEM_ID>
    <DESC>some_other_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
    <EXT_DESC>more_description_for_some_other_part_num</EXT_DESC>
    <LINE_NOTE>This is a note related to some_other_part_num</LINE_NOTE>
</LINE_INFO>
<LINE_INFO>
    <ITEM_ID>yet_another_part_num</ITEM_ID>
    <DESC>yet_another_part_num_description</DESC>
    <QTY>nn</QTY>
    <UNIT>uom</UNIT>
</LINE_INFO>
Alejandro
+1 for complete, working XSLT 2.0 answer.
LarsH