views:

424

answers:

4

I just have a link to a product page, at amazon. How do I get all the information (photo, price etc), in my ruby program, just using this link?

A: 

In your program: fetch the page and parse HTML. Filter out the required information. There may be some libraries in Ruby (that I am unaware of), which parse HTML.

hpricot seems to do what you want.

Alan Haggai Alavi
Isn't there any API (from amazon or otherwise) to do the same thing?
+1  A: 

If you want to do this, the Nokogiri or hpricot libraries both allow HTML parsing and searching. However, this kind of screen-scraping is notoriously unreliable (as it may break any time Amazon decides to reorganize their HTML), so if you're planning to do this sort of thing for any length of time I'd recommend leveraging the Amazon Product Advertising API instead.

Greg Campbell
+1  A: 

I found this library (I'm using Rails) amazon-ecs I'm experimenting with it. Still, I'd require some kind of ID (product id?) to get details of a particular product. For example, consider this link to kindle

http://www.amazon.com/Kindle-Amazons-Wireless-Reading-Generation/dp/B00154JDAI/ref=amb_link_84372271_1?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-1&pf_rd_r=06JJGQP9J3BHKPE38SXP&pf_rd_t=101&pf_rd_p=478184871&pf_rd_i=507846

In that link, I noticed ASIN, which is B00154JDAI.

Looks like I can use this ID, to get product information (using amazon-ecs). I just need to parse the URL, to get ASIN.

Is there any other way to do it?

No, I am not going to do screen scraping, that is not a good idea anytime.

Is there a reason you want another way to do it? Amazon's URLs are reasonably uniform so extracting the ASIN should not generally be a problem and amazon-ecs does provide a pretty simple abstraction. If you have some motivation for needing another way though..
Peter Cooper
I randomly checked some URLs. Found that they have something called ASIN (Amazon Standard Item Number). It appears somewhere in the URL, but not in the same format all the time. Sometimes they have /dp/ASIN, sometimes they have /gp/ASIN and sometimes they have just ASIN. There might be other combinations, I am not sure.Is there any API in amazon-ecs that can get me the ASIN if I pass the URL?
A: 

You should use the library Ruby/AWS (google for it, my karma is not high enough to allow external links...). It has been written exactly for that.

You might need to use the built-in Search to find the item you're looking for. After that, the API gives access to pictures, links and all usable information.

Oct