Hello all i need to scrape (with approval) web sites before I start to write my own what is the best tool/way to scrape web sites, which is both fast (multithreaded) and easy to learn?
A:
Take a look at this recent blog post by Lee Holmes. He wrote a pretty cool screen scraper using Powershell and the HTML Agility Pack.
Nick
2010-03-08 20:21:53
im more into linux c++/java world
2010-03-09 07:14:34
@user63898 - Then you should include that information in your question if you have particular technology requirements. We're not mind readers.
Nick
2010-03-10 14:24:02
A:
Consider using TestPlan. It has a display-less browser mode for fast scraping. The scripting language is very simple and quick to learn the basics.
edA-qa mort-ora-y
2010-03-10 13:53:42
A:
TagSoup, a SAX-compliant parser written in Java, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short.
Details here: http://mercury.ccil.org/~cowan/XML/tagsoup/
amit-agrawal
2010-03-10 14:00:30