tags:

views:

177

answers:

2

I have a directory of very large XML files with a structure as this:

file1.xml:

<root>
 <EmployeeInfo attr="one" />
 <EmployeeInfo attr="two" />
 <EmployeeInfo attr="three" />
</root>

file2.xml:

<root>
 <EmployeeInfo attr="four" />
 <EmployeeInfo attr="five" />
 <EmployeeInfo attr="six" />
</root>

Now I am looking for a simple way to merge these files (*.xml) files into one output file:

<root>
 <EmployeeInfo attr="one" />
 <EmployeeInfo attr="two" />
 <EmployeeInfo attr="three" />
 <EmployeeInfo attr="four" />
 <EmployeeInfo attr="five" />
 <EmployeeInfo attr="six" />
</root>

I was thinking about using pure XSLT such as this one:

<xsl:transform version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:template match="/">
    <Container>
      <xsl:copy-of select="document('file1.xml')"/>
      <xsl:copy-of select="document('file2.xml')"/>        
    </Container>
  </xsl:template>
</xsl:stylesheet>

This works but isn't as flexible as I want. Being a novice with PowerShell (version 2) eager to learn new best pracctices of working with XML in PowerShell I am wondering what is the simplest, purest PowerShell way of merging the structre of XML documents into one?

Cheers, Joakim

+2  A: 

Personally I would not use PowerShell for such a task.

Typically you use PowerShell to accessing config files like this

$config = [xml](gc web.config)

then you can work with the xml like with objects. Pretty cool. If you need to process large xml structures, then using [xml] (which is equivalent to XmlDocument) is quite memory expensive.

However, that's almost everything how PowerShell supports xml (get-command *xml* -CommandType cmdlet will give you all xml like commands).
It is of course possible to use .NET classes for xml operations, but that code won't be as pretty as true PowerShell approach. So, for your task you would need to use some readers/writers for that, which is imho not worthy doing.

That's why I think xslt is better approach ;) If you need to be flexible, you can generate the xlst template during script execution or just replace the file names, that's no problem.

stej
+1  A: 

While the XSLT way to do this is pretty short, so is the PowerShell way:

$finalXml = "<root>"
foreach ($file in $files) {
    [xml]$xml = Get-Content $file    
    $finalXml += $xml.InnerXml
}
$finalXml += "</root>"
([xml]$finalXml).Save("$pwd\final.xml")

Hope this helps,

Start-Automating
In case *very large XML files* are really large, that will consume large amount of memory and possibly could end up with OutOfMemoryException.
stej
Thanks, I'll try this as a quick fix!
Yooakim