views:

149

answers:

4

I have this project i'm working on and id like to add a really small list of nearby places using facebooks places in an iframe featured from touch.facebook.com I can easily just use touch.facebook.com/#/places_friends.php but then that loads the headers the and the other navigation bars for like messges, events ect bars and i just want the content.

I'm pretty sure from looking at the touch.facebook.com/#/places_friends.php source, all i need to load is the div "content" Anyway, i'm extremely new to php and im pretty sure what i think i'm trying to do is called web scraping.

For the sake of figuring things out on stackoverflow and not needing to worry about authentication or anything yet i want to load the login page to see if i can at least get the scrapper to work. Once I have a working scraping code i'm pretty sure i can handle the rest. It has load everything inside the div. I've seen this done before so i know it is possible. and it will look exactly like what you see when you try to login at touch.facebook.com but without the blue facebook logo up top and thats what im trying to accomplish right here.

So here's the login page, im trying to load the div which contains the text boxes to login the actual login button. If it's done correctly we should just see those with no blur Facebook header bar above it.

I've tried

<?php
$page = file_get_contents('http://touch.facebook.com/login.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
      if ($div->getAttribute('id') === 'login_form') {
         echo $div->nodeValue;
    }
}
?>

all that does is load a blank page.

I've also tried using http://simplehtmldom.sourceforge.net/

and i modified the example basic selector to

<?php
include('../simple_html_dom.php');

$html = file_get_html('http://touch.facebook.com/login.php');

foreach($html->find('div#login_form') as $e)
    echo $e->nodeValue;

?>

I've also tried

<?php
$stream = "http://touch.facebook.com/login.php";
$cnt = simplexml_load_file($stream);

$result = $cnt->xpath("/html/body/div[@id=login_form]");

for($i = 0; $i < $i < count($result); $i++){
    echo $result[$i];
}
?>

that did not work either

A: 

Scrapping isn't always the best idea for capturing data else where. I would suggest using Facebook's API to retrieve the values your needing. Scrapping will break any time Facebook decides to change their markup.

http://developers.facebook.com/docs/api

http://github.com/facebook/php-sdk/

B00MER
A: 

Im assuming that you can't use the facebook API, if you can, then I strongly suggest you use it, because you will save yourself from the whole scrapping deal.

To scrap text the best tech is using xpath, if the html returned by touch.facebook.com is xhtml transitional, which it sould, the you should use xpath, a sample should look like this:

$stream = "http://touch.facebook.com";
$cnt = simplexml_load_file($stream);

$result = $nct->xpath("/html/body/div[@id=content]");

for ($i = 0; $i < $i < count($result); $i++){
    echo $result[$i];
}
David Conde
what you're saying makes sense. I changed a few typos and spaces,but i get this error Parse error: syntax error, unexpected '<' on line 7. I'm not exactly sure what's going on in that part to really make any adjustments.
brybam
A: 

The problem is that the URL http://touch.facebook.com doesn't have a div with id="content". There is only <div id="topbar">...</div>.

Use the code below to see the content you are getting back from Facebook:

$page = file_get_contents('http://touch.facebook.com');
echo "<pre>".htmlspecialchars($page)."</pre>";

It seems that page gets redirected to login.php. If you want to scrape login.php you have to adjust the URL in your script.

captaintokyo
sorry that was my mistake, because i was already logged in. anyway, i logged out and now switched up the urls and information to match what is actually on the logon page but still running into the same issues. thanks for catching that though
brybam
If you use `http://touch.facebook.com/login.php` it's still the same problem. Do you mean `http://www.facebook.com/login.php?m2w`??
captaintokyo
when i paste http://touch.facebook.com/login.php exactly like that, it's not auto adding any extra characters on on my browser. On other pages it will but i'm sure http://touch.facebook.com/login.php is right
brybam
OK, if that is the page you want to scrape, the mistake is that you are looking for a `<div>` with `id="content"`. It does not exist! There is only a `<form>`-tag with `id="login_form"`. Hope this helps...
captaintokyo
There is a `<div>` with `class="login_form"` however...
captaintokyo
A: 

You need to learn about your comparison operators

=== is for comparing strictly, you should be using ==

if ($div->getAttribute('id') == 'login_form')
{

}
RobertPitt