tags:

views:

276

answers:

1

I have a long list of values in XML with named identifiers. I need to make separate output files for each of the distinct identifiers grouped together and uniquely named.

So, for example, let's say I have:

<List>
   <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
      Hello World!
   </Item>
   <Item group="::this_other_long_and_complicated_group_name_that_cannot_be_a_filename::">
      Goodbye World!
   </Item>
   <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
      This example text should be in the first file
   </Item>
   <Item group="::this_other_long_and_complicated_group_name_that_cannot_be_a_filename::">
      This example text should be in the second file
   </Item>
   <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
      Hello World!
   </Item>
</List>

How can I write a transformation (XSLT 2.0) to output these grouped into generated filenames and uniquely valued? For example: mapping the first @group to file1.xml and the second @group to file2.xml

+3  A: 

Here is a solution that uses some of the good new features in XSLT 2.0:

This transformation:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
      <!--                                                  --> 
    <xsl:template match="/*">
      <xsl:variable name="vTop" select="."/>
      <!--                                                  --> 
        <xsl:for-each-group select="Item" group-by="@group">
          <xsl:result-document href="file:///C:/Temp/file{position()}.xml">
            <xsl:element name="{name($vTop)}">
              <xsl:copy-of select="current-group()"/>
            </xsl:element>
          </xsl:result-document>
        </xsl:for-each-group>
    </xsl:template>
</xsl:stylesheet>

when applied on the OP-provided Xml document (corrected to be well-formed!):

<List>
    <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
         Hello World!
    </Item>
    <Item group="::this_other_long_and_complicated_group_name_that_cannot_be_a_filename::">
          Goodbye World!
  </Item>
    <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
          This example text should be in the first file
 </Item>
    <Item group="::this_other_long_and_complicated_group_name_that_cannot_be_a_filename::">
          This example text should be in the second file
 </Item>
    <Item group="::this_long_and_complicated_group_name_that_cannot_be_a_filename::">
          Hello World!
  </Item>
</List>

produces the wanted two files: file1.xml and file2.xml

Dimitre Novatchev
This is a great solution, Dimitre. I always see great work from you. However we're looking to place the grouped text together in two files. Is there some extra syntax we can add on to make this happen?
Jweede
@Jweede You haven't specified the criteria upon which to perform the grouping: Which nodes should go to which of the two files -- according to what rule?
Dimitre Novatchev
@Dimitre: I had a hard time articulating this problem, I'm sorry if it wasn't clear. There are two distinct group attributes, we want to put them together. i.e.: place all of the matching group attributes into a file together.
Jweede
@Jweede Then just replace: group-by="normalize-space(.) withgroup-by="normalize-space(@group)I will edit my answer shortly
Dimitre Novatchev
@Dimitre: +1 -- I was about to post a similar solution, but given the fact that my knowledge of XSLT 2.0 is somewhat limited, and had no tool to test my idea, I rather waited. ;-) Can you recommend a free (and ideally lightweight) XSLT 2.0 processor that runs on Windows?
Tomalak
@Tomalak The ultimate #1 Xslt 2.0 processor is Saxon(9.x). Its Basic version(not SA - schema aware) is Open Source and I have used Saxon for more than 5 years. The fastest and most optimized. The developer is Dr. Michael Kay himself -- the editor of the W3 XSLT TG. Simply and undisputably the best.
Dimitre Novatchev
@Tomalak The comment's length limit didn't let me say "thanks" in the last comment :)
Dimitre Novatchev
@Dimitre: Never mind. Trying Saxon-B 9.1 for .NET right now. It's the obvious choice that I somehow didn't think of. Thanks for the tip.
Tomalak
@Tomalak Maybe you need to know that Saxon 9.x for Java is about 3 times faster than Saxon.NET -- this is because Saxon.NET interprets the Java bytecode of Saxon.Java
Dimitre Novatchev
@Dimitre: This is good info, I wasn't aware of that. But it's somewhat logical that they are not maintaining two different codebases .
Tomalak
On windows there's a program called Kernow that makes XSLT 2.0 testing really simple.
Jweede
@jweede I have been using the XSelerator for 8 years and it is still the best XSLT IDE. I have no reason to switch to something else
Dimitre Novatchev