views:

403

answers:

5

I am creating a bookmarklet button that, when the user clicks on this button in his browser, will scrape the current page and get some values from this page, such as price, item name and item image.

These fields will be variable, means that the logic of getting these values will be different for each domain "amazon, ebay" for example.

My questions are:

  • Should i use javascript to scrape these data then send to the server?
  • Or just send to my server side the URL then use .net code to scrape values?
  • What is the best way? and why its better? advantages, disadvantages?

Look at this video and you will understand what i want to do exactly http://www.vimeo.com/1626505

+2  A: 

If you want to pull information from another site for use in your site (written in ASP.NET, for example) then you'll typically do this on the server side so that you have rich language for processing the results (e.g. C#). You'll do this via a WebRequest object in .NET.

The primary use of client side processing is to use Javascript to pull information to display on your site. An example would be the scripts provided by the Weather Channel to show a little weather box on your site or for very simple actions such as adding a page to favorites.

UPDATE: Amr writes that he is attempting to recreate the functionality of some popular screen scraping software which would require some quite sophisticated processing. Amr, I'd consider creating an application that uses the IE browser object to display web pages - it is quite simple. You could then just pull the InnerHTML (I think, it has been a few years since I implemented an IE-object-based program) to retrieve the contents of the page and do your magic. You could, of course, use a WebRequest object (just handing it the URL used in the browser object) but that wouldn't be very efficient as it would download the page a second time.

Is this what you are after?

Mark Brittingham
i think its not phishing, http://en.wikipedia.org/wiki/Web-scraping_software_comparison
Amr ElGarhy
It's a bookmarklet, thiscan be easily done, it can be dangerous in the wrong hands. But check out Magnolia for a great bookmarklet app.
Robert Gould
Thanks Robert. I'm not familiar with bookmarklets or Magnolia. I'll check it out.
Mark Brittingham
whats Magnolia, whats its URL?
Amr ElGarhy
I exact spelling is Ma.gnolia anyways google it and it's right there
Robert Gould
A: 

I would scrape it on the server side, because (i'am Java guy) i like static languages more then dynamic script languages, so maintaining the logic at the backend would be more comfortable to me. On the other side depends on how many items you want to scrape and how complex the logic for this would be. Perhaps the values are parseable with a single id selector in JavaScript, then server side processing could be overkill.

Mork0075
A: 

Bookmarklets are client-side per definition, but you could have the client depend on a server, but your example doesn't provide enough information. What do you want to do with the scraped info?

Robert Gould
+1  A: 

If you want to use only JavaScript to do this, you are liable to have a fairly large bookmarklet unless you know the exact layout of every site it will be used on (and even then it will be big).

A common way I have seen this done is to use a web service on your own server that your bookmarklet (which uses JavaScript) redirects to along with some parameters, like the URL of the page you are viewing. Your server would then scrape the page and do the work of parsing the HTML for the things you are interested in.

A good example is the "Import to Mendeley" bookmarklet, which passes the URL of the page you are visiting to its server where it then extracts information about scientific papers listed on the page and imports them into your collection.

ealdent
A: 

If you include the scraping code in the bookmarlet your users will have to update their bookmark if you include new functionality or bug-fixes. Do it server-side and all your users get the new stuff instantly :)

Adam Pope