tags:

views:

404

answers:

5

I'm using Perl's XML::Simple to parse deeply nested XML and would like to extract a small list of elements about 4 levels down:

A
  B
    C 
      D1
      D2
      D3

Ideally I want to do this on the input step, if possible. Like this:

my @list = XMLin($xml, { SomeAttribute => 'ButWhat?' });

ending up with the same thing as if I did:

@list = ('D1', 'D2', 'D3')

Is is possible? Or just not that 'simple'?

+2  A: 

Assuming your data in memory looks like:

my $parsed = {
    A => {
        B => {
            C => [ qw/here is your list/ ],
        },
    },
};

Then you can get your list with my @list = @{ $parsed->{A}{B}{C} }.

Is this what you are trying to do?

Edit: taking into account some of the comments, perhaps you want Data::Visitor::Callback. You can then extract all the arrays like:

my @arrays;
my $v = Data::Visitor::Callback->new(
    array => sub { push @arrays, $_ },
);
$v->visit( $parsed_xml );

After that runs, \@arrays will be a list of references to arbitrarily-deeply nested arrays.

Finally, if you just have an attribute name and want to search for matching XML nodes, you really want XPath:

use XML::LibXML;
my $parser = XML::LibXML->new;
my $doc = $parser->parse_string( $xml_string );

# yeah, I am naming the variable data.  so there.
my @data = map { $_->textContent } $doc->findnodes('//p[@id="foo"]');

Anyway, TMTOWTDI. If you are working with XML, and want to do something complicated, XML::Simple is rarely the right answer. I use XML::LibXML for everything, since it's nearly always easier.

One more thing, you may want Data::DPath. It lets you "XPath" an in-memory perl data structure:

jrockway
Thanks for the answer. Yes I know I can do this - but I was hoping to not have to test for the existence of all the levels in the hash to access the list.
git-noob
The trick there is that you have to know how many levels deep you're going to go before you start.
brian d foy
A: 

The fact that you're using XML::Simple is irrelevant; you're trying to search a structure of has refs and array refs. Do you know what it is you're searching for? Will it always be in the same place? If so, then something like what jrockway wrote will do the trick easily. If not, then you'll need to walk each piece of the structure until you find what you're looking for.

One thing I often do is to dump the structure that XML::Simple returns using Data::Dumper, to see what it looks like (if it will always "look" the same; if not, you can dynamically determine how to walk it by testing is something is a ref and what kind of ref it is). The real question is: what are you looking for?

Joe Casadonte
A: 

Data::Diver provides a nice interface for digging in deep structures.

daotoad
A: 

Building on Jon's answer, here's the basic code I use when I need to do this sort of thing. If I need anything fancier, I usually reach for a module if I'm allowed to do that.

The trick in get_values starts with the top-level reference, gets the next lower level, and puts it in the same variable. It keeps going until I get to where I want to be. Most of the code is just assertions to ensure that things work out right. In most cases I find it's the data that's messed up, not the traversal (but I do lots of data clean-up work). Adjust the error checking for your situation.

use Carp qw(croak);

my $parsed = {
  A => {
    B => {
      C => [ qw/here is your list/ ],
      D => {
        E =>  [ qw/this is a deeper list/ ],
        },
    },
  },
};

my @keys = qw( A B C D );

my @values = eval { get_values( $parsed, @keys ) } or die;

$" = " ][ ";
print "Values are [ @values ]\n";

sub get_values
    {
    my( $hash, @keys ) = @_;

    my $v = $hash; # starting reference

    foreach my $key ( @keys )
     {
     croak "Value is not a hash ref [at $key!]\n" unless ref $v eq ref {};
     croak "Key $key does not exist!\n" unless exists $v->{$key};
     $v = $v->{$key}; # replace with ref down one level
     }

    croak "Value is not an array ref!" unless ref $v eq ref [];
    @$v;
    }
brian d foy
A: 

Thanks for all the suggestions.

In the end I ducked the problem of traversing the data structure by using an eval block.

my $xml_tree;
my @list;

eval {

   # just go for it
   my @list = @{ $xml_tree->{A}->{B}->{C} };

};

if ($@) {
   say "oops - xml is not in expected format - and that happens sometimes";
}
git-noob
I don't think you need that many ->'s - $xml_tree->{A}{B}{C} should work fine.
Chris Lutz