tags:

views:

507

answers:

5

I am simply trying to retrieve an attribute from XML into my Perl program. However, I am having problems retrieving attributes.

I am using XML::Simple.

I can recover information fine when XML is like this:

<IdList>
<Id>17175540</Id>
</IdList>

by using this code

 $data->{'DocSum'}->{'Id'};

However, when the XML is like this:

<Item Name="Title" Type="String">
Some Title
</Item>

I am not getting any data back when using the following code

$data->{'DocSum'}->{'Title'};

BTW, this is the link I am getting the XML from http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&amp;id=19288470

+2  A: 

I guess you are using XML::Simple to parse XML. I would suggest that you dump your data structure using Data::Dumper. You should be able to find it pretty easily then.

use Data::Dumper;

print Dumper($data);
weismat
i am using dumper to print data on screen and it is showing { 'Type' => 'String', 'content' => 'Some title', 'Name' => 'Title' }
Title is the value of the entity, not the ID/key.Are you looking for "Some title" or for "Title"?
weismat
I am looking for "Some Title". Which is under 'Content' but even if I do $dataSummary->{'DocSum'}->{'Title'}->{'content'}; I do not get 'Some Title'
Then dump $dataSummary instead of $data...
weismat
$data and $dataSummary are same in this context. When writing original question I had just typed '$data' rather than $dataSummary
+4  A: 

run:

$ perl -MXML::Simple -M'Data::Dump qw/pp/' 
my $ref = XMLin('<Item Name="Title" Type="String">Some Title</Item>');
pp $ref;

output:

{ Name => "Title", Type => "String", content => "Some Title" }

So, it appears you should be looking under 'content' to find it.

derobert
I tried $dataSummary->{'DocSum'}->{'Title'}->{'content'}; but even that doesn't work
You didn't post the full XML, but try something like $data->{'DocSum'}{'content'}. No clue where it is in your structure exactly, unless you post enough XML for me to recreate your structure, or alternatively post the Data::Dump[er] of your structure.
derobert
+1  A: 

It looks like XML::Simple is guessing wrong about how to transform the data. Have you tried munging the KeyAttr option of XMLin()?

daotoad
+3  A: 

I took the xml from that page you provided, used the entire thing as a string for the argument to XMLin, and had success with

print $data->{DocSum}->{Item}->[5]->{content};

giving the output

Bromoxynil degradation in a Mississippi silt loam soil.

This is pretty much the same thing derobert was saying.

Edit:

Rather than assuming the 6th Item element is the one you are after, to print the content of the node where the Name attribute is 'Title' (and then break out of the loop since you've found what you want):

foreach my $item_node (@{$data->{DocSum}->{Item}})
{
    if($item_node->{Name} eq 'Title')
    {
        print $item_node->{content};
        last;
    }
}

Of course, this is still only looking at the Item nodes immediately under DocSum, so if you were looking for PubType instead of Title, it wouldn't be found due to that being a child of the PubTypeList Item node.

Cebjyre
XML is not positional like that. That works, but is very fragile. It'd be better to actually search for the name=Title bit. I'll write up an example later today if you haven't beat me to it :-D
derobert
Fair point, my original answer was mainly about getting at the content in general, but I've added some smarter code that does the searching for the specific node.
Cebjyre
Looks good to me.
derobert
+2  A: 

But of course, 'Title' is not a key, but an attribute value, and thus a hash value. You need XPath and then you can specify /DocSum/Item[@Name='Title']

The equivalent in XML::Simple (or Perl), is

my ( $item ) = grep { $_->{Name} eq 'Title' } @{$data->{DocSum}{Item}};

or even

use List::Util qw<first>;
...
( first { $_->{Name} eq 'Title' } @{$data->{DocSum}{Item}} )->{content};

I have to disagree with daotoad. It's not transforming the data wrong as far as I can see. You're just not working with what it produces correctly. It's a Simple module, it's not robust, and not too DWIM.

Axeman