views:

275

answers:

3

CURRENT XML:

<?xml version="1.0"?>

<form1>
   <page1>
      <first_name></first_name>
      <last_name></last_name>
      .
      .
   </page1>
   <page2>
      <address></address>
      <phone_number></phone_number>
      .
      .
   </page2>
   <page3>
      <company_name></company_name>
      <job_title></job_title>
      .
      .
   </page3>
</form1>

DESIRED XML - i want to merge all child elements and rename the parent:

<?xml version="1.0"?>

<form>
   <page>
      <first_name></first_name>
      <last_name></last_name>
      .
      .
      <address></address>
      <phone_number></phone_number>
      .
      .
      <company_name></company_name>
      <job_title></job_title>
      .
      .
   </page>
</form>

then, since i have thousands of XML files with some unknown elements, i want to find all of them before bulk importing the XML into Access database, because any new elements in subsequent files will be dropped if they are not defined in the schema.

not all child elements are known. not all file names are known.

so, how can i check all files for all elements, fill the Access table with them all, then bulk import all the XML records to fit into the desired schema as shown above?

EDIT:

ok, i see - there are no attributes. what i meant was all child elements. thanks for pointing that out Oded, I updated the question with corrections.

this is the VBA code I am using in Access for bulk importing the files:

 Private Sub cmdImport_Click()
 Dim strFile As String 'Filename
 Dim strFileList() As String 'File Array
 Dim intFile As Integer 'File Number
 Dim strPath As String ' Path to file folder

 strPath = "C:\Users\Main\Desktop\XML-files"
 strFile = Dir(strPath & "*.XML")

 While strFile <> ""
      'add files to the list
     intFile = intFile + 1
     ReDim Preserve strFileList(1 To intFile)
     strFileList(intFile) = strFile
     strFile = Dir()
 Wend
 'see if any files were found
 If intFile = 0 Then
     MsgBox "No files found"
     Exit Sub
 End If

 'cycle through the list of files
 For intFile = 1 To UBound(strFileList)
     Application.ImportXML strPath & strFileList(intFile), acAppendData

 Next intFile
MsgBox "Import Completed"

End Sub

i can use the stylesheet to transform the XML as such:

  For intFile = 1 To UBound(strFileList)
     Application.TransformXML strPath & strFileList(intFile), _
     "C:\Users\Main\Desktop\stylesheet2.xslt", _
     "C:\Users\Main\Desktop\temp.xml", True
     Application.ImportXML "C:\Users\Main\Desktop\temp.xml", acAppendData
   Next intFile

 MsgBox "Import Completed"
End Sub

however, it does not merge all the file elements into one table. am i missing something? do i need to save a variable list? or create some attribute ids?

EDIT: From comments

my file names are 1.xml, 2.xml, 3.xml, 4.xml, etc. But like i said have thousands

A: 

This stylesheet produces the output that you described.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
<xsl:output indent="yes" />

<xsl:template match="/">
<!--generate standard document element and it's child element-->
<form>
    <page>
            <!--Apply templates to children of document element's, child element's, children-->
        <xsl:apply-templates select="/*/*/node()" />
    </page>
</form>
</xsl:template>

<!--Identity template copies all content forward-->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

If you just want to copy the elements under the page elements, rather than any node() (element, text, comment, or processing instruction), then you could change the XPATH from: /*/*/node() to: /*/*/*

Mads Hansen
Mads, the first template in your stylesheet works very nice, however it seems as if the second template has no effect. when transforming the XML, i had it create a temp file - which i could see the transformation of the first template; however, the second template had no effect as such the temp file only saved the last imported XML file. any suggestions - maybe i need to save a variable list of all the page elements?
jujie
BOTH templates need to be included in the stylesheet for it to work. The first template sets up the main structure and then applies templates for the child elements. The second template is an identity template, which simply copies the attribute or node, and then applies templates for any of it's attributes or child nodes. This identity template is what copies each element into the output. If you run a stylesheet that only has the identity template it will simply make a copy of your XML file.
Mads Hansen
i understand, but i always had both templates included in the stylesheet; still, any newer child nodes are not being created.
jujie
newer child nodes at what level? The stylesheet should copy forward all content starting with elements at the 3rd level(i.e. /form/page/*)
Mads Hansen
I just read through some of the other comments. So, this stylesheet works for a single document. If you have a collection of documents that you want to run through a single transform and merge, then you will need a different solution more in line with what @alejandro suggests
Mads Hansen
A: 

If you really want to just copy all the contents of the page{N} elements, then this transformation is probably the shortest:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
   <form>
    <page>
      <xsl:copy-of select="/*/*/node()"/>
    </page>
   </form>
 </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
this works too, for changing the stylesheet, nice! the problem i'm having is when performing the bulk import into Access, i have a few options:
jujie
1. import structure. 2. import structure and data. 3. import only data. if i include structure, then with a sample 3 files i will get 3 tables; one with each files record: page, page1 and page2. if i include only data, then only the first file structure will be imported, and the following files will append their data to only the first record's child elements.
jujie
i am trying to get all files be a record in one table - the original problem was and still is that if any new files be imported, if they have new child elements which don't exist in the first, then that data is dropped.
jujie
see above for VBA code i am using
jujie
A: 

Suppose this input documents:

1.xml

<form1>
   <page1>
        <first_name>D</first_name>
        <last_name>E</last_name>
   </page1>
   <page2>
        <address>F</address>
        <phone_number>1</phone_number>
   </page2>
   <page3>
        <company_name>G</company_name>
   </page3>
</form1>

2.xml

<form2>
   <page1>
        <first_name>A</first_name>
   </page1>
   <page2>
        <address>B</address>
   </page2>
   <page3>
        <company_name>C</company_name>
        <job_title>H</job_title>
   </page3>
</form2>

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:key name="kElementByName" match="/*/*/*" use="name()"/>
    <xsl:param name="pMaxFileNumber" select="2"/>
    <xsl:template match="/">
        <xsl:variable name="vFieldsNames">
            <xsl:call-template name="names">
                <xsl:with-param name="pFrom" select="1"/>
                <xsl:with-param name="pTo" select="$pMaxFileNumber"/>
                <xsl:with-param name="pFieldsNames" select="'|'"/>
            </xsl:call-template>
        </xsl:variable>
        <form>
            <xsl:call-template name="merge">
                <xsl:with-param name="pFrom" select="1"/>
                <xsl:with-param name="pTo" select="$pMaxFileNumber"/>
                <xsl:with-param name="pFieldsNames" select="$vFieldsNames"/>
            </xsl:call-template>
        </form>
    </xsl:template>
    <xsl:template name="names">
        <xsl:param name="pFrom"/>
        <xsl:param name="pTo"/>
        <xsl:param name="pFieldsNames"/>
        <xsl:choose>
            <xsl:when test="$pFrom = $pTo">
                <xsl:value-of select="$pFieldsNames"/>
                <xsl:apply-templates
                     select="document(concat($pFrom,'.xml'),/)/*/*/*
                                       [count(.|key('kElementByName',
                                                    name())[1])=1]
                                       [not(contains($pFieldsNames,
                                                     concat('|',name(),'|')))]"
                     mode="names"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="vNewTop"
                              select="floor(($pTo - $pFrom) div 2) + $pFrom"/>
                <xsl:variable name="vNewFieldsNames">
                    <xsl:call-template name="names">
                        <xsl:with-param name="pFrom" select="$pFrom"/>
                        <xsl:with-param name="pTo" select="$vNewTop"/>
                        <xsl:with-param name="pFieldsNames"
                                        select="$pFieldsNames"/>
                    </xsl:call-template>
                </xsl:variable>
                <xsl:call-template name="names">
                    <xsl:with-param name="pFrom" select="$vNewTop + 1"/>
                    <xsl:with-param name="pTo" select="$pTo"/>
                    <xsl:with-param name="pFieldsNames"
                                    select="$vNewFieldsNames"/>
                </xsl:call-template>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template name="merge">
        <xsl:param name="pFrom"/>
        <xsl:param name="pTo"/>
        <xsl:param name="pFieldsNames"/>
        <xsl:choose>
            <xsl:when test="$pFrom = $pTo">
                <page>
                    <xsl:apply-templates
                     select="document(concat($pFrom,'.xml'),/)/*/*[1]/*[1]">
                        <xsl:with-param name="pFieldsNames"
                                        select="$pFieldsNames"/>
                    </xsl:apply-templates>
                </page>
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="vNewTop"
                              select="floor(($pTo - $pFrom) div 2) + $pFrom"/>
                <xsl:call-template name="merge">
                    <xsl:with-param name="pFrom" select="$pFrom"/>
                    <xsl:with-param name="pTo" select="$vNewTop"/>
                    <xsl:with-param name="pFieldsNames" select="$pFieldsNames"/>
                </xsl:call-template>
                <xsl:call-template name="merge">
                    <xsl:with-param name="pFrom" select="$vNewTop + 1"/>
                    <xsl:with-param name="pTo" select="$pTo"/>
                    <xsl:with-param name="pFieldsNames" select="$pFieldsNames"/>
                </xsl:call-template>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template match="/*/*">
        <xsl:param name="pFieldsNames"/>
        <xsl:apply-templates select="*[1]">
            <xsl:with-param name="pFieldsNames" select="$pFieldsNames"/>
        </xsl:apply-templates>
    </xsl:template>
    <xsl:template match="/*/*/*" name="copy">
        <xsl:param name="pFieldsNames"/>
        <xsl:copy>
            <xsl:value-of select="."/>
        </xsl:copy>
        <xsl:variable name="vName" select="concat('|',name(),'|')"/>
        <xsl:apply-templates select="following::*[1]">
            <xsl:with-param name="pFieldsNames"
                            select="concat(substring-before($pFieldsNames,
                                                            $vName),
                                           '|',
                                           substring-after($pFieldsNames,
                                                           $vName))"/>
        </xsl:apply-templates>
    </xsl:template>
    <xsl:template match="/*/*[last()]/*[last()]">
        <xsl:param name="pFieldsNames"/>
        <xsl:call-template name="copy"/>
        <xsl:variable name="vName" select="concat('|',name(),'|')"/>
        <xsl:call-template name="empty">
            <xsl:with-param name="pFieldsNames"
                            select="substring(
                                      concat(substring-before($pFieldsNames,
                                                              $vName),
                                             '|',
                                             substring-after($pFieldsNames,
                                                             $vName)),
                                      2)"/>
        </xsl:call-template>
    </xsl:template>
    <xsl:template match="/*/*/*" mode="names">
        <xsl:value-of select="concat(name(),'|')"/>
    </xsl:template>
    <xsl:template name="empty">
        <xsl:param name="pFieldsNames"/>
        <xsl:if test="$pFieldsNames!=''">
            <xsl:element name="{substring-before($pFieldsNames,'|')}"/>
            <xsl:call-template name="empty">
                <xsl:with-param name="pFieldsNames"
                           select="substring-after($pFieldsNames,'|')"/>
            </xsl:call-template>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

Output:

<form>
    <page>
        <first_name>D</first_name>
        <last_name>E</last_name>
        <address>F</address>
        <phone_number>1</phone_number>
        <company_name>G</company_name>
        <job_title />
    </page>
    <page>
        <first_name>A</first_name>
        <address>B</address>
        <company_name>C</company_name>
        <job_title>H</job_title>
        <last_name />
        <phone_number />
    </page>
</form>

Note: If this blows up your memory, then you need to split this in two stylesheets: first, output the names; second, merge. If you can't pass param with Application.TransformXML, then the max number of files is fixed. Also, there must not be any hole: if max number of files is 3, 2.xml can't be missed (this is because fn:document throws an error)

EDIT: For a two pass transformation.

This stylesheet with any input (not used):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:param name="pMaxFileNumber" select="2"/>
    <xsl:template match="/">
        <form>
            <xsl:call-template name="copy">
                <xsl:with-param name="pFrom" select="1"/>
                <xsl:with-param name="pTo" select="$pMaxFileNumber"/>
            </xsl:call-template>
        </form>
    </xsl:template>
    <xsl:template name="copy">
        <xsl:param name="pFrom"/>
        <xsl:param name="pTo"/>
        <xsl:choose>
            <xsl:when test="$pFrom = $pTo">
                <page>
                    <xsl:copy-of 
                     select="document(concat($pFrom,'.xml'),/)/*/*/*"/>
                </page>
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="vMiddle"
                      select="floor(($pTo - $pFrom) div 2) + $pFrom"/>
                <xsl:call-template name="copy">
                    <xsl:with-param name="pFrom" select="$pFrom"/>
                    <xsl:with-param name="pTo" select="$vMiddle"/>
                </xsl:call-template>
                <xsl:call-template name="copy">
                    <xsl:with-param name="pFrom" select="$vMiddle + 1"/>
                    <xsl:with-param name="pTo" select="$pTo"/>
                </xsl:call-template>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

Output:

<form>
    <page>
        <first_name>D</first_name>
        <last_name>E</last_name>
        <address>F</address>
        <phone_number>1</phone_number>
        <company_name>G</company_name>
    </page>
    <page>
        <first_name>A</first_name>
        <address>B</address>
        <company_name>C</company_name>
        <job_title>H</job_title>
    </page>
</form>

And this stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:key name="kElementByName" match="/*/*/*" use="name()"/>
    <xsl:variable name="vElements"
                  select="/*/*/*[count(.|key('kElementByName',name())[1])=1]"/>
    <xsl:template match="form">
        <xsl:copy>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="page">
        <xsl:copy>
            <xsl:apply-templates select="$vElements">
                <xsl:with-param name="pContext" select="."/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="/*/*/*">
        <xsl:param name="pContext"/>
        <xsl:element name="{name()}">
            <xsl:value-of select="$pContext/*[name()=name(current())]"/>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

With previus output as input, result:

<form>
    <page>
        <first_name>D</first_name>
        <last_name>E</last_name>
        <address>F</address>
        <phone_number>1</phone_number>
        <company_name>G</company_name>
        <job_title></job_title>
    </page>
    <page>
        <first_name>A</first_name>
        <last_name></last_name>
        <address>B</address>
        <phone_number></phone_number>
        <company_name>C</company_name>
        <job_title>H</job_title>
    </page>
</form>
Alejandro
this indeed does combine the XML files exactly as you say, onto one XML file. However, say the following fields were in file2 rather than file1: <phone_number></phone_number><job_title></job_title>...............then when importing to Access those fields will be dropped - even if its on the imported XML file. huh?
jujie
definitely getting closer to a solution though.
jujie
@jujie: Check my edit. Now, all rows in output have all fields, even empty ones.
Alejandro
Alejandro, this works for 2 files. I tried to add a full folder using directory location in order that the thousands of files can be transformed, or using "*.xml", or "" - haven't figured it out yet. so far, if i don't specify the exact files, then its not creating any output.
jujie
@jujie: As the name of the parameter, you have to pass a comma separate list of file names. A minor refactor would be to pass an XML document or URI to retrive all the names. I think there is a XML serialization method for directory class...
Alejandro
ok, i will open up another thread for this specific task - and then post final solution here, hopefully. thanks
jujie
http://stackoverflow.com/questions/3735816/how-do-i-pass-a-full-directory-of-xml-files-into-xslt
jujie
@jujie: In a quick check for `Application.TransformXML`, I couldn't find a method to pass param to stylesheets. Now, I see **this** is your problem to run this stylesheet. You need to pass those documents URI as the input source. When you come back with the input format, I will refactor this stylesheet.
Alejandro
btw, my file names are 1.xml, 2.xml, 3.xml, 4.xml, etc. But like i said have thousands. not sure if this helps, but i will try to find out the input format in mean time...
jujie
@jujie: That is a lot of diferences. Check my edit.
Alejandro
Alejandro - This works! at least for now :P
jujie
what i did was run the first stylesheet given above by Mads and Dimitre - loop the files through the XSLT and and at the same time numbering them sequentially in a folder. then i take your stylesheet producing the output.xml and import it without losing any newer fields.
jujie
its a process, but it works as needed for now. thank you everyone for your input.
jujie
@alejandro - "The XML Guru" - i am trying to vote your answer, but unfortunately need rep for that.
jujie
btw, with 200 files right now memory is fine - takes about 30 secs.
jujie
@jujie: I'm glad it helped you! I post another two stylesheets for two pass transformation. Take a look and see if we can boost those 30 sec. Maybe you don't even need a DVC pattern for 200 files...
Alejandro
hmm, seems to be the same run time. thats true, 200 in this batch, but lots more in others.
jujie