Looking to scrape a website

views:

498

answers:

+1 Q:

Looking to scrape a website

I am looking to scrape a website like yelp.com, to get a listing of all the bars they have there. Are there any tools or scripts out there which would help me do this.

+5 A:

From a Python perspective

HTTPLib2 to automate the page downloads.
Beautiful Soup for parsing the HTML source to get the info you want.

Read An Introduction to Compassionate Screen Scraping for good tutorial to get you started that uses both tools.

Tristan 2009-08-28 23:58:47

+1 for making me discover HTTPLib2.

Ölbaum 2009-08-29 00:02:50

Has anyone out here used heritrix?

sharadov 2009-08-31 20:42:56

HTTrack - its cross platform, been using this for years

mozami 2009-08-28 23:59:02

+2 A:

If you know Python, there the pyQuery module that I find handy. Like jQuery, it lets you use enhanced CSS selectors to select DOM objects, I find it far easier than using a parser.

Ölbaum 2009-08-29 00:01:21

I wrote a scraper back in the dot-com era to suck info from a few e-commerce websites. I used perl and for each site had two tiers of code. The "discover" tier would parse and find lists of items and the "process" tier would read product pages and separate fields of data and feed them into a database.

From the looks of what you want to do I think rolling your own solution is probably the best approach as it's not really complicated. Use Perl or a similar interpreted language with good string processing and regex support.

Separating the pages is really easy. Forget about parse trees (I went that way and gave up on it), it's much easier and straightforward to manually identify the clumps of text of the template bordering each piece of info you want and put that on a regex to extract the data.

Put them on a list, hash, whatever and then do what you want with it.

Kristoffon 2009-08-29 00:07:31

+1 A:

I've done work like this on Superpages and citySearch using screen-scraper. From there you can write your results to a CSV, database, or whatever.

Jason Bellows 2009-08-31 18:31:26

Thanks I downloaded a trial version of the s/w.The tutorial is very detailed too

sharadov 2009-09-01 21:29:35

Last time when I was looking for such tool my friend suggested me Automation Anywhere. I feel its a nice tool because the best part is point-n-click extraction used. What you can do is look out for more info on this web scraping tool and use the free trial for a better idea. I learned about it on this screen scraping page. Have a view.

Alberto Ricon 2010-07-01 10:19:20

hmm i'm interested in this as well.. how can I use screen-scraper to get a portion of local businesses form yelp

rogerhp 2010-09-01 06:56:08

ansaurus

tags:

views:

answers:

Looking to scrape a website

related questions