views:

311

answers:

2

I want to make a screen scraping exactly the same idea as this one http://www.vimeo.com/1626505 What i want know is how to do so? - When the user click on the bookmarklet, it will send to my server the URL then my server will get back to the client page with the scrapping javascript files which will load with the iframe load, the java script will scrap the data on the current page and put on the iframe.

OR - The Bookmarklet will send my server just the URL, and my server will open the url using .Net code and scrap it, get from it the needed data, then send to the client the iframe filled with the right data.

Which is right? or there is another way? and why its right not the another one?

+2  A: 

I don't know if it is just like the one in Vimeo but here is an easy server-side implementation that uses C#:

 WebRequest wrContent = WebRequest.Create("http://www.site.com/yourtargetpage.html");
 Stream objStream = wrContent.GetResponse().GetResponseStream();
 StreamReader objStreamReader = new StreamReader(objStream);
 Content = objStreamReader.ReadToEnd();
 DataBind();

"Content" is a string variable that is declared at the head of your page class:

 protected string Content;

Place this in your ASPX page where you want the content to appear:

 <%# Content %>

It is very easy.

Mark Brittingham
+2  A: 

Client-Side

Pros

The page is already loaded there
It's relatively easy to parse through the DOM with JS

Cons

Is JS turned on?
How fast is the user's machine?
What happens if the user browses away mid-stream?

Server-Side

Pros

More control (I assume) than JS
Page sniffing should be pretty quick

Cons

The page gets hit twice (increased network traffic)
Your server actually has to parse/process the page (how does it scale?)

Can't really think of any more (I'm sure I'm missing a lot). Bottom line is, it's a measure of your client-side/server-side skills against the bandwidth and server load considerations.

Michael Todd
So there is no security problems to insert javascript files in the client page, and make these script scrap this page and put the data on the iframe?
Amr ElGarhy
A browser (or plug-in) might prevent your code from doing that. That's another thing I didn't think of. Overall, it might be safest/best to just send the URL to your server and process it since that prevents the security issue as well as the user browsing away from the site too fast.
Michael Todd