tags:

views:

225

answers:

4

Hi, I'm trying to merge two file that have the same structure, and some data in common. So if a node has the same name in both files, a new node should be created with the children of both original nodes. The original files are the following:

file1.xml
<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
    <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
     <CUSTOMER ID='M1'/>
     <CUSTOMER ID='M2'/>
     <CUSTOMER ID='M3'/>
    </SECURITY>
    <SECURITY CUSIP='CUSIP3' DESCRIPT='CUSIP3'>
     <CUSTOMER ID='M4'/>
     <CUSTOMER ID='M5'/>
     <CUSTOMER ID='M6'/>
    </SECURITY>
</BROADRIDGE>

file2.xml
<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
    <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
     <CUSTOMER ID='B1'/>
     <CUSTOMER ID='B2'/>
     <CUSTOMER ID='B3'/>
    </SECURITY>
    <SECURITY CUSIP='CUSIP2' DESCRIPT='CUSIP2'>
     <CUSTOMER ID='B4'/>
     <CUSTOMER ID='B5'/>
     <CUSTOMER ID='B6'/>
    </SECURITY>
</BROADRIDGE>

The idea is to create a new XML file with the same structure that contains the information from both files, merging those SECURITY nodes that have the same CUSIP attribute. In this case the result should be the following:

<?xml version="1.0" encoding="UTF-8"?>
<BROADRIDGE>
    <SECURITY CUSIP="CUSIP1">
     <CUSTOMER ID="M1"/>
     <CUSTOMER ID="M2"/>
     <CUSTOMER ID="M3"/>
     <CUSTOMER ID='B1'/>
     <CUSTOMER ID='B2'/>
     <CUSTOMER ID='B3'/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP3">
     <CUSTOMER ID="M4"/>
     <CUSTOMER ID="M5"/>
     <CUSTOMER ID="M6"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP2">
     <CUSTOMER ID="B4"/>
     <CUSTOMER ID="B5"/>
     <CUSTOMER ID="B6"/>
    </SECURITY>
</BROADRIDGE>

I've defined the folling xml to joing them:

<?xml version="1.0"?>                                  
<MASTERFILE>
   <FILE>\file1.xml</FILE>
   <FILE>\file2.xml</FILE>
</MASTERFILE>

And the following XSL to do the merge:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="/MASTERFILE">
     <BROADRIDGE>
      <xsl:variable name="securities" select="document(FILE)/BROADRIDGE/SECURITY"/>
      <xsl:for-each select="$securities">
       <xsl:if test="generate-id(.) = generate-id($securities[@CUSIP=current()/@CUSIP])">
        <SECURITY>
         <xsl:attribute name="CUSIP" ><xsl:value-of select="@CUSIP"/></xsl:attribute>
         <xsl:for-each select="CUSTOMER">
          <CUSTOMER>
           <xsl:attribute name="ID" ><xsl:value-of select="@ID"/></xsl:attribute>
          </CUSTOMER>
         </xsl:for-each>
        </SECURITY>
       </xsl:if>
      </xsl:for-each>
     </BROADRIDGE>
    </xsl:template>
</xsl:stylesheet>

But I'm getting the following:

<?xml version="1.0" encoding="UTF-8"?>
<BROADRIDGE>
    <SECURITY CUSIP="CUSIP1">
     <CUSTOMER ID="M1"/>
     <CUSTOMER ID="M2"/>
     <CUSTOMER ID="M3"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP3">
     <CUSTOMER ID="M4"/>
     <CUSTOMER ID="M5"/>
     <CUSTOMER ID="M6"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP2">
     <CUSTOMER ID="B4"/>
     <CUSTOMER ID="B5"/>
     <CUSTOMER ID="B6"/>
    </SECURITY>
</BROADRIDGE>

Any idea why it's not merging the CUSTOMERS from both file for SECURITY with CUSIP = CUSIP1?

+1  A: 

The generate-id() function is guaranteed to be different for every node that participates in a given transformation. As your calling it on differnt documents, they will not be the same

You should compare the string values of the CUSIPS in the documents rather than their ID's.

If you can use xslt 2.0 (which is a lot better than 1), this will work

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
        <xsl:output indent="yes"/>
        <xsl:template match="/MASTERFILE">
                <BROADRIDGE>
                        <xsl:variable name="securities" select="document(FILE)/BROADRIDGE/SECURITY"/>
                        <xsl:for-each select="distinct-values($securities/@CUSIP)">
                                <SECURITY>
                                        <xsl:attribute name="CUSIP">
                                                <xsl:value-of select="."/>
                                        </xsl:attribute>

                                        <xsl:for-each select="distinct-values($securities[@CUSIP = 'CUSIP1']/CUSTOMER/@ID)">
                                                <CUSTOMER>
                                                  <xsl:attribute name="ID">
                                                  <xsl:value-of select="."/>
                                                  </xsl:attribute>
                                                </CUSTOMER>
                                        </xsl:for-each>
                                </SECURITY>
                        </xsl:for-each>
                </BROADRIDGE>
        </xsl:template>
</xsl:stylesheet>
Robert Christie
Unfortunately I'm not allowed to change the version of XSL, since this change is for a production system. But the solution that Roland gave me, although not very efficient, works; so I'll use that. Thank you for your suggestion.
Jose L Martinez-Avial
+1  A: 

(See my comment on the "one-way-merge" on the OP.) Here's my (very inefficient) solution to the merge problem:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:variable name="set1" select="document('file1.xml')/BROADRIDGE/SECURITY"/>
    <xsl:variable name="set2" select="document('file2.xml')/BROADRIDGE/SECURITY"/>

    <xsl:template match="/">
        <BROADRIDGE>
            <!-- walk over all relevant nodes -->
            <xsl:for-each select="$set1 | $set2">
                <xsl:variable name="position" select="position()"/>
                <xsl:variable name="cusip" select="@CUSIP"/>
                <!-- if we see this CUSIP for the first time, --> 
                <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0">
                    <SECURITY>                            
                        <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
                        <!-- copy nodes from both sets with matching attribute -->
                        <xsl:copy-of select="$set1[@CUSIP = $cusip]/*"/>
                        <xsl:copy-of select="$set2[@CUSIP = $cusip]/*"/>
                    </SECURITY>
                </xsl:if>
            </xsl:for-each>
        </BROADRIDGE>
    </xsl:template>
</xsl:stylesheet>

Note that the stylesheet does not suppose any particular document - it simply loads the two files as variables. One can improve th xslt design by parameterizing the urls for the to be loaded XML documents

To apply the merge to multiple documents, you can create a file, say master.xml that lists all the files to process like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="merge.xslt"?>
<files>
  <file>file1.xml</file>
  <file>file2.xml</file>
  ...
  <file>fileN.xml</file>    
</files>

In file1.xml, I have this:

<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
  <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
    <CUSTOMER ID='M1'/>
    <CUSTOMER ID='M2'/>
    <CUSTOMER ID='M3'/>
  </SECURITY>
  <SECURITY CUSIP='CUSIP3' DESCRIPT='CUSIP3'>
    <CUSTOMER ID='M4'/>
    <CUSTOMER ID='M5'/>
    <CUSTOMER ID='M6'/>
  </SECURITY>
</BROADRIDGE>

In file2.xml, I have this:

<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
  <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
    <CUSTOMER ID='B1'/>
    <CUSTOMER ID='B2'/>
    <CUSTOMER ID='B3'/>
  </SECURITY>
  <SECURITY CUSIP='CUSIP2' DESCRIPT='CUSIP2'>
    <CUSTOMER ID='B4'/>
    <CUSTOMER ID='B5'/>
    <CUSTOMER ID='B6'/>
  </SECURITY>
</BROADRIDGE>

the merge.xslt is a modified version of the earlier one, which is now capable of processing a variable number of files (the files listed in master.xml):

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

<xsl:template match="/">
  <xsl:call-template name="merge-files"/>
</xsl:template>

<!-- loop through file names, load documents -->
<xsl:template name="merge-files">
  <xsl:param name="files" select="/files/file/text()"/>
  <xsl:param name="num-files" select="count($files)"/>
  <xsl:param name="curr-file" select="0"/>
  <xsl:param name="set" select="/*[0]"/>
  <xsl:choose> <!-- if we still have files, concat them to $set -->
    <xsl:when test="$curr-file &lt; $num-files">
      <xsl:call-template name="merge-files">
        <xsl:with-param name="files" select="$files"/>
        <xsl:with-param name="num-files" select="$num-files"/>
        <xsl:with-param name="curr-file" select="$curr-file + 1"/>
        <xsl:with-param name="set" select="$set | document($files[$curr-file+1])/BROADRIDGE/SECURITY"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise> <!-- no more files, start merging. -->
      <xsl:call-template name="merge">
        <xsl:with-param name="nodes" select="$set"/>
      </xsl:call-template>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<!-- perform the actual merge -->
<xsl:template name="merge">
  <xsl:param name="nodes"/>
  <BROADRIDGE>
    <xsl:for-each select="$nodes"> <!-- look at all possible nodes to merge -->
      <xsl:variable name="position" select="position()"/>
      <xsl:variable name="cusip" select="@CUSIP"/>

      <!-- when we encounter this id for the 1st time -->
      <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0"> 
        <SECURITY>
          <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
          <!-- copy all node data related to this cusip here -->
          <xsl:for-each select="$nodes[@CUSIP = $cusip]">
            <xsl:copy-of select="*"/>
          </xsl:for-each>
        </SECURITY>
      </xsl:if>
    </xsl:for-each>
  </BROADRIDGE>
</xsl:template>

</xsl:stylesheet>

Running this gives me this output:

<BROADRIDGE>
  <SECURITY CUSIP="CUSIP1">
    <CUSTOMER ID="M1"/>
    <CUSTOMER ID="M2"/>
    <CUSTOMER ID="M3"/>
    <CUSTOMER ID="B1"/>
    <CUSTOMER ID="B2"/>
    <CUSTOMER ID="B3"/>
  </SECURITY>
  <SECURITY CUSIP="CUSIP3">
    <CUSTOMER ID="M4"/>
    <CUSTOMER ID="M5"/>
    <CUSTOMER ID="M6"/>
  </SECURITY>
  <SECURITY CUSIP="CUSIP2">
    <CUSTOMER ID="B4"/>
    <CUSTOMER ID="B5"/>
    <CUSTOMER ID="B6"/>
  </SECURITY>
</BROADRIDGE>
Roland Bouman
I think that you are missing the definition of the variable nodes. I assume it is defined as:<xsl:variable name="nodes" select="set1|set2"/>With that change, what I'm getting is<?xml version="1.0" encoding="UTF-8"?><BROADRIDGE> <SECURITY CUSIP="CUSIP1"> <CUSTOMER ID="M1"/>[...]<CUSTOMER ID="B3"/> </SECURITY> <SECURITY CUSIP="CUSIP3"> <CUSTOMER ID="M4"/>[...]<CUSTOMER ID="M6"/> </SECURITY> <SECURITY CUSIP="CUSIP1"> <CUSTOMER ID="M1"/>[...]<CUSTOMER ID="B3"/> </SECURITY> <SECURITY CUSIP="CUSIP2"> <CUSTOMER ID="B4"/>[...]<CUSTOMER ID="B6"/> </SECURITY></BROADRIDGE>
Jose L Martinez-Avial
So it is duplicating the securities that appear in both set, whith all their children. So I'm still missing something. Any case, thanks for your help!
Jose L Martinez-Avial
I forgot the dollar sign when creating the variable nodes... my fault<xsl:variable name="nodes" select="$set1|$set2"/>With that change, it works. Thank you
Jose L Martinez-Avial
Is there any way to load the documents in the variables nodes using a for-each or something like that? With that I can use that template for any file. I tried <xsl:variable name="nodes"> <xsl:value-of select="document('file1.xml')/BROADRIDGE/SECURITY"/> <xsl:value-of select="document('file2.xml')/BROADRIDGE/SECURITY"/> </xsl:variable>But it doesn't work. How can I load multiple documents in a variable?
Jose L Martinez-Avial
Jose, you can use variables in the call to document() function, so you can use that to make doc loading flexible. Is that what you mean?
Roland Bouman
I just want to load diferent number of files. Something like the following: <xsl:variable name="nodes"> <xsl:for-each select='/MASTERFILE/FILE'> <xsl:value-of select="document(.)/BROADRIDGE/SECURITY"/> </xsl:for-each> </xsl:variable>and then apply the template to the following xml master file<?xml version="1.0"?> <MASTERFILE> <FILE>\file1.xml</FILE> <FILE>\file2.xml</FILE></MASTERFILE>
Jose L Martinez-Avial
Jose, the original solution was updated. Perhaps you can accept it now?
Roland Bouman
I just posted the final solution I adopted, based in your code. Thank you very much for your help!!
Jose L Martinez-Avial
+1  A: 

Either you're making this much too complicated, or there are other aspects of this problem that you haven't mentioned:

<xsl:variable name="file1" select="document(/MASTERFILE/FILE[1])"/>
<xsl:variable name="file2" select="document(/MASTERFILE/FILE[2])"/>

<xsl:template match="/">
   <BROADRIDGE>
      <xsl:apply-templates select="$file1/BROADRIDGE/SECURITY"/>
      <xsl:copy-of select="$file2/BROADRIDGE/SECURITY[not(@CUISP=$file1/BROADRIDGE/SECURITY/@CUISP)]"/>
   </BROADRIDGE>
</xsl:template>

<xsl:template match="SECURITY">
   <SECURITY>
      <xsl:copy-of select="*"/>
      <xsl:copy-of select="$file2/BROADRIDGE/SECURITY[@CUSIP=current()/@CUSIP]/*"/>
   </SECURITY>
</xsl:template>
Robert Rossney
Well, I'd like to make this as flexible as possible, because now I only have to merge two files, but in the future I will have n files. As you said there is something else I didn't mentioned. The structure of the file is more complex, because the customer nodes have children, but since I need to copy the entire nodes, with the children and attributes, it doesn't really matter for this issue, which is to generate a SECURITY node for each distinct security in all files, and then add all the children for that security in all files. So it is really a joining.
Jose L Martinez-Avial
A: 

Roland, thanks for your examples. Based on the first code you sent, I developed the following template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:variable name="nodes" select="document(/MASTERFILE/FILE)/BROADRIDGE/SECURITY"/>
    <xsl:template match="/">
        <BROADRIDGE>
            <!-- walk over all relevant nodes -->
            <xsl:for-each select="$nodes">
                <xsl:variable name="position" select="position()"/>
                <xsl:variable name="cusip" select="@CUSIP"/>
                <!-- if we see this CUSIP for the first time, --> 
                <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0">
                    <SECURITY>                            
                        <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
                        <xsl:attribute name="DESCRIPT"><xsl:value-of select="@DESCRIPT"/></xsl:attribute>
                        <!-- copy nodes from both sets with matching attribute -->
                        <xsl:copy-of select="$nodes[@CUSIP = $cusip]/*"/>
                    </SECURITY>
                </xsl:if>
            </xsl:for-each>
        </BROADRIDGE>
    </xsl:template>

I just give to the document function the list of files, so it creates a node set with all the SECURITY nodes from all the files. When I apply it to the following xml

<?xml version="1.0"?>
<MASTERFILE>
   <FILE>\file1.xml</FILE>
   <FILE>\file2.xml</FILE>
   <FILE>\file3.xml</FILE>
</MASTERFILE>

It works perfectly. Thank you for your samples

Jose L Martinez-Avial
Jose, can you please formally accept my first answer then? You should be able to "check" it, so that a green checkmark appears there. If you do, I get reputation points, and in addition, the question will appear as "solved". TIA, roland.
Roland Bouman