ansaurus

Question

Data Scraping Problem

Answer 1

+2 A:

The button/link probably starts a XMLHttpRequest, so look in your browser with firebug/developer console/whatever you use, to see what url it is requesting and with what HTTP headers etc. Then just do the same request with cURL and you've got it?

CharlesLeaf 2010-08-28 20:57:50

Answer 2

+4 A:

You should use the Graph API. The data you are scraping is available in JSON format at

http://graph.facebook.com/GMHTheBook/feed

and contains links for getting previous/next pages, e.g. paging.

Example:

$data = json_decode(file_get_contents(($url)));
foreach($data->data as $post) {
    echo $post->from->name, ': ',
         $post->message,
         PHP_EOL;
}

The above will output all the posts on the wall. For paging do

echo $data->paging->previous;
echo $data->paging->next;

This will output two URLs. All you have to do is load them again.

Gordon 2010-08-28 21:44:37

@Gordon: Great, did not know about modifying url that way for the graph api. Thanks

Sarfraz 2010-08-29 04:52:11

@Gordon: Please see my update :)

Sarfraz 2010-08-29 11:19:17

@Sarfraz should probably be a followup question than an update

Gordon 2010-08-29 18:06:34

Answer 3

A:

http://www.facebook.com/ajax/stream/profile.php?__a=1&amp;profile_id=139878432710216&amp;viewer_id=(your facebook id)&filter=1&max_time=1283023194&_log_clicktype=Filter%20Stories%20or%20Pagination&ajax_log=1

It is loaded via ajax. You also need to figure out these variables. Max time is probably from what point on to show posts.

Ok, upper link can be shorter (same output)...

http://www.facebook.com/ajax/stream/profile.php?__a=1&amp;profile_id=139878432710216&amp;max_time=1283023194

Webarto 2010-08-29 01:22:40

ansaurus

tags:

views:

answers:

Data Scraping Problem

Update:

related questions