Hi, I have a huge number (2k+) of xml files that I need to extract data from and dump that data into a table of some sort (Excel or simply one or multiple XML files would be fine). The fun part is that the xml files have wildly differing amount of nodes (with occasionally the same names in the subnodes) and also the depth of the hierarchy is not known.
Anyway, here is an example:
<?xml version="1.0" encoding="UTF-8"?>
<SomeName>
<identifier>
<UID> 1234 </UID>
</identifier>
<MainNode1>
<SubNode1>
<Subnode1a>DATA1a0</Subnode1a>
</SubNode1>
<SubNode1>
<Subnode1a>DATA1a1</Subnode1a>
</SubNode1>
<SubNode1>
<Subnode1a>DATA1a2</Subnode1a>
</SubNode1>
</MainNode1>
<MainNode2>
<SubNode2>
<Subnode2a>DATA2a0</Subnode2a>
</SubNode2>
<SubNode3>
<Subnode3a>DATA3a0</Subnode3a>
</SubNode3>
<SubNode4>
<Subnode4a>DATA4a0</Subnode4a>
</SubNode4>
</MainNode2>
<MainNodeIDONTCARE>
<SubnodeWhatever>
</SubnodeWhatever>
<MainNodeIDONTCARE>
</SomeName>
And here is my table I want it to flatten to. Basically it should look like this:
<?xml version="1.0" encoding="UTF-8"?>
<SomeName>
<UID>1234</UID>
<MainNode1_SubNode1_SubNode1aA>DATA1a0</MainNode1_SubNode1_SubNode1aA>
<MainNode1_SubNode1_SubNode1aB>DATA1a1</MainNode1_SubNode1_SubNode1aB>
<MainNode1_SubNode1_SubNode1aC>DATA1a2</MainNode1_SubNode1_SubNode1aC>
<MainNode2_SubNode2_SubNode2a>Data2a0</MainNode2_SubNode2_SubNode2a>
<MainNode2_SubNode3_SubNode3a>Data3a0</MainNode2_SubNode3_SubNode3a>
<MainNode2_SubNode4_SubNode4a>Data4a0</MainNode2_SubNode4_SubNode4a>
</SomeName>
As you can see the <MainNodeIDONTCARE>
is not in the table, because I simply want to extract specific data sets from the XML files (I want to include MainNodes that I need and just ignore all others). In this case anything that is in MainNode1 through MainNode4.
What would be the best way to get this done? I don't care about performance, because the files aren't too big and the conversion can take as long as it wants to.
Is there some clever XSLT or anything else that can help me? Thank you.