tags:

views:

394

answers:

3

I asked a question yesterday http://stackoverflow.com/questions/657058/perl-xml-simple-retrieveing-attribute link I am using to get the XML:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=19273512 (1)

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=19291509 (2)

I made very good progress and wrote the following code which loops through the tags and searches for the ones I need. I am looking for 'doi' tag under 'ArticleIds'

   foreach $item_node (@{$dataSummary->{DocSum}->{Item}})
        {
                if($item_node->{Name} eq 'ArticleIds')
                {
                        foreach $item_node1 (@{$item_node->{Item}})
                        {
                                if ($item_node1->{Name} eq 'doi')
                                {
                                    $doi=  $item_node1->{content};  
                                    last;
                                }
                        }
                        last;

                }
        }

This code basically searches for ArticleIds tag and then searches subtags under it to find 'doi' tag.

Problem I am having is that when ArticleIds has multiple sub tags under it (as can be seen in (2)) then everything works fine. However, when ArticleIds tag only has ONE subtag under it (as shown in (1)) then there are errors and program just stops.

I am using the Simple Parser and using the dumper I got two results. Here is some part of the dump for link (1)

{ 'Type' => 'List', 'Item' => { 'Type' => 'String', 'content' => '19273512', 'Name' => 'pubmed' }, 'Name' => 'ArticleIds' }

for link (2)

{ 'Type' => 'List', 'Item' => [ { 'Type' => 'String', 'content' => '909564644', 'Name' => 'pii' }, { 'Type' => 'String', 'content' => '10.1080/13506120802676914', 'Name' => 'doi' }, { 'Type' => 'String', 'content' => '19291509', 'Name' => 'pubmed' } ], 'Name' => 'ArticleIds' }

As you can see. when there are multiple tags under ArticleIds then it is being treated as an array, hence the square brackets.

What would someone suggest in a case like this?

+5  A: 

If the file only has one of the Item elements, the item will show up in a hash. If there's multiple Item elements, then it'll show up as an array. You can force certain tags to always contain a list using the ForceArray option. Pass it a regular expression of all the attribute names that you wish to force into an array, and it'll take care of the rest.

XMLin( 'file.xml', 
       ForceArray => qr{Item}x );

Oh, also check what version of XML::Simple you're using. I think earlier versions you could only specify an array ref of values with ForceArray, or it didn't work at all. If it only works with an arrayref, you can specify it with:

XMLin( 'file.xml', 
       ForceArray => [ 'Item' ] );

Check out The XML::Simple CPAN documentation to see more options that may help you.

As far as the version, if you're using XML::Simple that came with, say, an ActiveState distribution, it's most likely out of date. Try grabbing a newer one.

You can also check to see what type it is, either using

$item =~ /HASH/  # hash
$item =~ /ARRAY/ # array

or the ref keyword (as you discovered)

ref($item) eq 'HASH' 
ref($item) eq 'ARRAY'
Robert P
i did the following but still if there is only one Item. It is coming in a hash.$contents = get($getstring)$data = $xml->XMLin($contents,ForceArray => qr{Item}x);
do i have to do something difference since the Item tag i want to put in an array is under ArticleIds tag?
this is frustrating...:( is there a quick way to find out the XML::Simple version I am using?
perl -wle'use XML::Simple; print XML::Simple->VERSION'
ysth
+4  A: 

I think one of the problems that you're having is that you are somewhere between XML::Simple not giving you enough knobs and dials, but the problem isn't complicated enough for you to write something more complicated.

In this case, I'd reach for something like XML::Twig. It more event driven so it can walk across your XML and give you control when you want it. Once you get the element you like, you can do whatever you like with it.

Besides things like Twig, various things like XPath and so on can be useful in the same way. They are built to look deep into XML to pull out parts of it, unlike XML::Simple which just gives you a data structure.

brian d foy
+1  A: 

I had oldver version on XML::Simple So i decided to use the ref() function and write some extra lines of code.

Thanks for help

That's the way I usually handle it.
Joe Casadonte