tags:

views:

112

answers:

1

I want to pull in data using a XML::XPath parser from a XML DB file from the Worldbank site. The problem is that I'm not seeing any results in the output. I must be missing something in the code. Ideally, I would like to extract just the death rate statistics from each country XML DB (year and value). I'm using this as part of my input:

http://data.worldbank.org/sites/default/files/countries/en/afghanistan_en.xml

use strict;
use LWP 5.64;
use HTML::ContentExtractor;
use XML::XPath;

my $agent1 = LWP::UserAgent->new;
my $extractor = HTML::ContentExtractor->new();

#Retrieve main Worldbank country site
my $mainlink = "http://data.worldbank.org/country/";
my $page = $agent1->get("$mainlink");
my $fulltext = $page->decoded_content();

#Match to just all available countries in Worldbank
my $country = "";
my @countryList;
if (@countryList = $fulltext =~ m/(http:\/\/data\.worldbank\.org\/country\/.*?")/gi){
    foreach $country(@countryList){
        #Remove " at the end of link
        $country=~s/\"//gi;
        print "\n" . $country;

        #Retrieve each country profile's XML DB file
        my $page = $agent1->get("$country");
        my $fulltext = $page->decoded_content();
        my $XML_DB = "";
        my @countryXMLDBList;

        if (@countryXMLDBList = $fulltext =~ m/(http:\/\/data\.worldbank\.org\/sites\/default\/files\/countries\/en\/.*?\.xml)/gi){
            foreach $XML_DB(@countryXMLDBList){

                my $page = $agent1->get("$XML_DB");
                my $fulltext = $page->decoded_content();
                #print $fulltext; 
                #Use XML XPath parser to find elements related to death rate
                my $xp = XML::XPath->new($fulltext); #my $xp = XML::XPath->new("afghanistan_en.xml"); 
                my $nodeSet = $xp->find("//*");
                if (!$nodeSet->isa('XML::XPath::NodeSet') || $nodeSet->size() == 0) {
                    #No match found
                    print "\nMatch not found!";
                    exit;
                } else {
                    foreach my $node ($nodeSet->get_nodelist){
                        print "\n" . $node->find('country')->string_value;
                        print "\n" . $node->find('indicator')->string_value;
                        print "\n" . $node->find('year')->string_value;
                        print "\n" . $node->find('value')->string_value;
                        exit;
                    }
                }
            }
            #Build line graph based on death rate statistics and output some image file format
        }
    }
}

I am also looking into using the xpath expression "following-sibling", but not sure how to use it correctly. For example, I have the following set of XML data where I am only interested in pulling siblings directly after the indicator for just death rate data.

<data>
<country id="AFG">Afghanistan</country>
<indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator>
<year>2006</year>
<value>20.3410000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator>
<year>2007</year>
<value>19.9480000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator>
<year>2008</year>
<value>19.5720000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="IC.EXP.DOCS">Documents to export (number)</indicator>
<year>2005</year>
<value>7.0000000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="IC.EXP.DOCS">Documents to export (number)</indicator>
<year>2006</year>
<value>12.0000000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="IC.EXP.DOCS">Documents to export (number)</indicator>
<year>2007</year>
<value>12.0000000</value>
</data>

Any help would be much appreciated!!!

A: 

I dont understand the first part of the question -- it says that:

I'm not seeing any results in the output. I must be missing something in the code.

However, this is not a question at all. Especially, when there is no input data provided, and no definition of "results".

For the second part:

I am also looking into using the xpath expression "following-sibling", but not sure how to use it correctly. For example, I have the following set of XML data where I am only interested in pulling siblings directly after the indicator for just death rate data.

Use the following XPath expressions (supposing that the data elements are children of the top element of the XML document:

/*/data/indicator[@id = 'SP.DYN.CDRT.IN']/following-sibling::*

Dimitre Novatchev
Thanks! I'll try this out! By the way, I've updated the first part of my question for you.