tags:

views:

585

answers:

5

I'm using a Facebook application that has a rich set of information that I'd like to get at offline. To do this, I essentially need to read the infromation from the web pages into my own database. Obviously, I'd prefer not to have to save pages manually and let my application read the pages and pull the relevant details from them. Unfortunately, I am road-blocked by the requirement to authenticate to Facebook first. So when I run this code:

private static string getPage(string pageAddress)
{
    HttpWebRequest req = (HttpWebRequest)WebRequest.Create(new Uri(baseUri, pageAddress));
    HttpWebResponse response = (HttpWebResponse)req.GetResponse();
    StreamReader readStream = new StreamReader(response.GetResponseStream());
    string page = readStream.ReadToEnd();
    readStream.Close();
    response.Close(); // I know, I'm paranoid and this is likely redundant...
    return page;
}

I get this response:

<script type="text/javascript">
if (parent != self) 
top.location.href = "http://www.facebook.com/login.php?api_key=&lt;obscured&gt;&amp;canvas&amp;v=1.0";
else self.location.href = "http://www.facebook.com/login.php?api_key=&lt;obscured&gt;&amp;canvas&amp;v=1.0";
</script>

Any ideas how to tell the app that I really am the authentic me?

+1  A: 

You need to use the facebook API to get data from facebook. They block screenscaping

Ivo
I don't want data out of Facebook. Facebook doesn't have the data I want/need. They're just the gatekeeper to authenticate at the app I want to access.
Jacob Proffitt
+2  A: 

Facebook uses REST so you can request the data from the server; it also has the ability to access data from the client-side JavaScript API. You can check the wiki for more information; it uses a rest_server.php?method= to get the data from the appropriate methods.

Check out http://www.facebook.com/developers to get more information about these objects/methods/creating an application so you can query facebook data.

Brian
I'm not sure what you mean, Brian. I don't need data from facebook itself, I want the data from the facebook app. The link you gave didn't have data about objects/methods/or creating.
Jacob Proffitt
This link: http://wiki.developers.facebook.com/index.php/Main_Page is off of that previous link I sent. The data from facebook is used within the facebook app, so in my line of thinking they are one and the same, so that is why I sent you this... So the API will help you get the data you are looking for, as screenscraping is blocked.
Brian
+1  A: 

You will first have to write a script to programmatically log into facebook. Then you will have to save the cookies you get.

I have done something similar with curl and php. (curl has built in cookie handling)

Drew LeSueur
+1  A: 

I think they're using cookies to pass authentication, so first you'll need your app to login to facebook, and keep the cookie in a CookieContainer, then assign that to req.CookieContainer, and only then req.GetResponse();

Ofri Raviv
+7  A: 

As far as I understood you just need to login to facebook appliction, right? Use any web scraping/crawling framework for it (they support JS, cookies, etc.). They just emulate usuall web browsing. For example, try these:

http://scrapy.org/

http://wwwsearch.sourceforge.net/mechanize/

http://watin.sourceforge.net/

Also see

http://stackoverflow.com/questions/1852725/net-screen-scraping-and-session

Alexey Kalmykov
Do you know of any such libraries for use in .Net?
Jacob Proffitt
Added one .NET library that can used
Alexey Kalmykov
WatiN for the win. It's a tad awkward, but it allows me to do exactly what I want to. Good work.
Jacob Proffitt
^ any chance you can post the code you used?
Devtron