I am looking for recommendations for a screenscraper I need to extract "Contact Us" information from certain web sites.
Any ideas where I can get a good (pref free) screenscarper?
I am looking for recommendations for a screenscraper I need to extract "Contact Us" information from certain web sites.
Any ideas where I can get a good (pref free) screenscarper?
Write your own -- it isn't hard. if you aren't familiar with programming or have a choice for programming languages: use Python the library support for doing scrapping great.
As for how to attack the problem their are two popular techniques: use regular expressions, they work best for ad-hoc screen scrapping. If your target web-sites are well structured -- read: not ad-hoc -- then use a framework that allows you to work with the DOM.
Navigation and Extraction
These are the two phases of writing a spider. Your spider needs to navigate a website to visit different pages, and it needs to extract information of interest. Both these phases can be driven by either the DOM or RE's
p.s., Since your name indicates .NET -- I should mention that I have written scrappers in C-Sharp -- it's a doddle.
Screen scraping is nicely done by Automation Anywhere. I think it can also extract data from web in set pattern - automatically. I found a demo depicting how to screen scrape. Check out!
Enjoy!