views:

200

answers:

4

I need to pull out all of the "NodeGroup" elements out of an XML file:

<Database>
  <Get>
    <Data>
      <NodeGroups>
        <NodeGroup>
          <AssociateNode ConnID="6748763_2" />
          <AssociateNode ConnID="6748763_1" />
          <Data DataType="Capacity">2</Data>
          <Name>Alpha</Name>
        </NodeGroup>
        <NodeGroup>
          <AssociateNode ConnID="6748763_23" />
          <AssociateNode ConnID="6748763_7" />
          <Data DataType="Capacity">2</Data>
          <Name>Charlie</Name>
        </NodeGroup>
        <NodeGroup>
          <AssociateNode ConnID="6748763_98" />
          <AssociateNode ConnID="6748763_12" />
          <Data DataType="Capacity">2</Data>
          <Name>Papa</Name>
        </NodeGroup>
        <NodeGroup>
          <AssociateNode ConnID="6748763_8" />
          <AssociateNode ConnID="6748763_45" />
          <Data DataType="Capacity">2</Data>
          <Name>Yankee</Name>
        </NodeGroup>
      </NodeGroups>
      <System>
        ...
      </System>
    </Data>
  </Get>
</Database>

If I could use python and BeautifulSoup, I would parse the xml and call something like:

node_group_array = soup.findAll("nodegroups")

But I am using Perl and Perl's XML modules, so I used XML::Simple's XMLIn, recursively walking through each hash key, checking if the value was a hash, checking if it was the "NodeGroup" hash, etc.

I would think that there's something like soup.findAll() in one of Perl's XML modules, but I can't find it. How do I do "soup.findAll('nodegroups')" in Perl?

+1  A: 

There is no "XML" module in Perl. There are many modules in the XML:: namespace. My favorite is XML::LibXML, but for something this simple, you could even use HTML::Parser in "xml-mode".

Randal Schwartz
thanks for the heads up, i fixed my wording in the question
aaronstacy
+3  A: 

To clarify Randal's answer a bit, I think you want the XML::LibXML::XPathContext API provided by the XML::LibXML distribution:

my $xpath = XML::LibXML::XPathContext->new($document);
for my $node ( $xpath->find('//NodeGroup') { ... }
Dave Rolsky
+1  A: 

XML::DOM has getElementsByTagName (so do XML::LibXML::DOM and XML::GDOME) which works like the DOM function of the same name.

MkV
A: 

Using XML::Simple with the data file shown:

#!/usr/bin/perl

use strict; use warnings;

use XML::Simple;

my $db = XMLin($ARGV[0]);
my $nodegroups = $db->{Get}{Data}{NodeGroups}{NodeGroup};

use Data::Dumper;
print Dumper $nodegroups;

You might want to use the ForceArray => 1 option to guarantee consistency in case you have some files with multiple <NodeGroups>...</NodeGroups> sections and others with a single such section.

If the files are not too big, using XML::Simple should be fine. See also the caveats section in the documentation.

Sinan Ünür
argh, no this is not a job for XML::Simple.
singingfish
The OP was trying to use `XML::Simple`, so I showed how it can be used. In the code above, `$nodegroups` is a reference to array of `NodeGroup`s.
Sinan Ünür