Hey all, I have highly repetitive data with a depth of 5 nodes deep (including the root) that needs to be broken apart. (I'll include a fast sample in a minute.) What I'm looking to do is parse a ~5mb XML file into smaller sub-files based on the 3rd-depth nodes. But after that, it gets more complicated.
The task's requirements are these:
- Sub-files must maintain the hierarchical parents of the 3rd level node which is extracted, including their attributes.
- Sub-files must retain all attributes and children nodes.
- If XSLT cannot handle the job, attempt it in Ruby. If you aren't good at XSLT, but can tell me how to do it in Ruby or even Python, please feel free to contribute an answer in those languages. (Else try and stick with XSLT or pseudo-code.)
DOM Hierarchy:
<xml attr="whatever">
<major-group name="whatever">
<minor-group name="whatever">
<another-group name="whatever">
<last-node name="whatever"></last-node>
</another-group>
</minor-group>
</major-group>
</xml>
Which I need to split on the minor-group element while retaining both its children and direct parents, and put all that (for each minor-group) in an external file. I have several files to split in this manner.
And... having never before parsed XML in Ruby, and having just begun using XSLT, I cannot yet write a script to accomplish my task with either.
I'm curious to see if XSLT is up to the task. :>
Edit:
Here's my resulting code, with the ability to show a stylesheet at the beginning of the file.
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml"/>
<xsl:template match="minor-group">
<xsl:variable name="filename"><xsl:value-of select="concat(@name,'.xml')"/></xsl:variable>
<xsl:result-document href="{$filename}">
<xsl:text disable-output-escaping="yes">
<![CDATA[<?xml-stylesheet type="text/xsl" href="../web.xslt"?>]]>
</xsl:text>
<xml>
<xsl:attribute name="whatever"><xsl:value-of select="../../@whatever" /></xsl:attribute>
<major-group>
<xsl:attribute name="whatever"><xsl:value-of select="../@whatever" /></xsl:attribute>
<xsl:copy-of select="."/>
</major-group>
</xml>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>