tags:

views:

136

answers:

3

I am trying to parse the XML found on the page ...

http://www.rapleaf.com/apidoc/person

Name: Test Dummy
Age: 42
gender: Male
Address: San Francisco, CA, US
Occupation:
University: Berkeley
first seen: 2006-02-23
last seen: 2008-09-25
Friends: 42
Name:
Age:
gender:
Address:
Occupation:
University:
first seen:
last seen:
Friends: 

1) I had to remove the records where "&" was found. I could process the page only after that.

2) I could not parse the "membership site" nor could I parse "occupation"

3) I am getting 2 records when I am expecting only one.

4) How do I insert these records in the Database?

<?php

// displays all the file nodes
if(!$xml=simplexml_load_file('rapleaf.xml')){
    trigger_error('Error reading XML file',E_USER_ERROR);
}

foreach($xml as $user){
    echo 'Name: '.$user->name. '
<br /> Age: '.$user->age.'
<br /> gender: '.$user->gender.'
<br /> Address: '.$user->location.'
<br /> Occupation: '.$user->occupations->occupation->company.'
<br /> University: '.$user->universities->university.'
<br /> first seen: '.$user->earliest_known_activity.'
<br /> last seen: '.$user->latest_known_activity.'
<br /> Friends: '.$user->num_friends.'
<br />';
}

?>
A: 

This is no XML...

How could ANYONE upvote this? No one bothered to check the link? It is an XML...
x3ro
A: 

1 . Ampersands are part of the XML syntax specification (they are used to encode non-standard characters). Therefore, they cannot be used alone in XML documents. They have to be encoded into & or they have to be enclosed in a CDATA-block : http://www.w3schools.com/xmL/xml_cdata.asp.

2 . You cannot access children elements like that ($user->occupations->occupation), because the element has children. You will have to do something like:

$a = $user->occupations->children();
$b = $b->occupation->attributes();
$c = (string)$b->company;

Check out http://php.net/manual/de/book.simplexml.php for more information.

3 . You are getting two records, because XML elements always have a root element which encloses their children. Therefore, when you iterate which foreach over $xml, you first get a SimpleXMLElement object for , and then for . is used as root element.

4 . This really is another question, and dependant on which database you want to use. Google will help you on that. You'll probably want to use MySQL, because you are working with php. So check out http://www.google.de/search?sourceid=chrome&amp;ie=UTF-8&amp;q=php+mysql+tutorial :)

x3ro
+1  A: 

To be able to parse that document (which is not well formed) I would recommend to do the following:

$xmlString = file_get_contents('rapleaf.xml');
$xmlString = str_replace('&', '&amp;', $xmlString);

if(!$xml=simplexml_load_string($xmlString)){
    trigger_error('Error reading XML file',E_USER_ERROR);
}

First read the file into a string, that replace the ampersand characters (within the link) with their entity. That you can use the simplexml_load_file() function to create the xml object.

Now you are able to parse the document. As far as I can see, there is only one person in each file. So you don't need a foreach loop. But you can parse all field, you just have to know how. Here is some more complex exmaple parsing different things with different methods:

echo '    Name: '.(string)$xml->basics->name. '
        <br /> Age: '.(string)$xml->basics->age.'
        <br /> gender: '.(string)$xml->basics->gender.'
        <br /> Address: '.(string)$xml->basics->location;
// There might be more than one occupation
foreach($xml->occupations as $occupation){
    echo '<br /> Occupation: '.$occupation->attributes()->title;
    if(isset($occupation->attributes()->company)){
        echo '; at company: '.$occupation->attributes()->company;
    }
}
// There might be more than one university
foreach($xml->universities as $university){
    echo '<br /> University: '.$university;
}
echo    '<br /> first seen: '.(string)$xml->basics->earliest_known_activity.'
        <br /> last seen: '.(string)$xml->basics->latest_known_activity.'
        <br /> Friends: '.(string)$xml->basics->num_friends;
// getting all the primary membership pages
foreach($xml->memberships->primary->membership as $membership){
    if($membership->attributes()->exists == "true"){
        echo '<br />'.$membership->attributes()->site;
        if(isset($membership->attributes()->profile_url)){
            echo ' | '.$membership->attributes()->profile_url;
        }
        if(isset($membership->attributes()->num_friends)){
            echo ' | '.$membership->attributes()->num_friends;
        }
    }
}

For Text that is included in a tag, you have to cast it to string:

echo 'Name: '.(string)$xml->basics->name;

To get the value of an attribute of a tag, use the attributes() function. You don't have to cast it this time:

echo 'Occupation: '.$xml->occupations->occupation[0]->attributes()->title;

As you can see, you can also get a specific child node, as all the child nodes are stored in an array. Just use the index. If you just want one child node, you don't have to use a loop for that.

But you always have to make sure that the element you are using the attirbutes() function on is valid as otherwise an error will be thrown. So so may want to test that via isset() to be sure.

I hop you now have an idea on how to parse some XML using SimpleXML. If you have any additional questions, just ask again or even in a new question.

Kau-Boy
I tried the code with the live web data through API and I noticed that the site name is displayed even if exists=false condition is satisfied. Can you please explain why does this happen?
shantanuo
I have now quoted the "true" value. Test this, it might be the problem.
Kau-Boy