views:

116

answers:

3

Hello all i need to scrape (with approval) web sites before I start to write my own what is the best tool/way to scrape web sites, which is both fast (multithreaded) and easy to learn?

A: 

Take a look at this recent blog post by Lee Holmes. He wrote a pretty cool screen scraper using Powershell and the HTML Agility Pack.

Nick
im more into linux c++/java world
@user63898 - Then you should include that information in your question if you have particular technology requirements. We're not mind readers.
Nick
A: 

Consider using TestPlan. It has a display-less browser mode for fast scraping. The scripting language is very simple and quick to learn the basics.

edA-qa mort-ora-y
A: 

TagSoup, a SAX-compliant parser written in Java, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short.

Details here: http://mercury.ccil.org/~cowan/XML/tagsoup/

amit-agrawal