views:

93

answers:

2

Hi,

For my new project I need to implement a .NET based web crawler. I searched for an open source option and found an entry here at SO that mentioned Arachnode.net as an open source solution. I visited arachnode.net and for my surprise, the project is fully commercial and there is no even a free community edition (if it's really an open source project).

Am I missing something about Arachnode.net? If no, is there a good open source alternative?

Thanks in advance :-)

+1  A: 

I don't know of any Open Source projects for .NET, but if your just looking to do something pretty basic you can easily use HttpWebRequest and HttpWebResponse to create your own web crawler. I've written many integration solutions that log in to remote website to process/scrape data using these objects. If you decide to go this route, just an FYI but I almost always need a CookieContainer to hold my session information and a few well tested RegularExpressions to do my scourging (e.g. Find all HREF, Regex("<a href.*?>")).

Good Luck!

Zachary