views:

358

answers:

2

Hello everyone,

I am using VSTS 2008 + C# + .Net 3.5. I want to find a tool (open source) which crawls all web pages for a web site, and for any other domain pages which is linked by this web site, I want to skip to crawl them (I only need page for this specific domain only). For crawled web page, I want to store them into local file directory.

Any samples or ready to use open source tool?

thanks in advance, George

+1  A: 

Arachnode.net might be what you are looking for.

Steve Haigh
Good stuff, is there a web based interface so that we can make query against the full-text analysis results?
George2
Hi Steve, how good is Arachnode.net for non en-us language? Any experience of index/search for non en-us language, like France, Japanese? Any plug-in needed for such language? (I think keyword extraction, indexing and parsing may be different for different languages?)
George2
Thanks for all of your help, Steve! I have marked your reply as answered.
George2
I'm afriad I have not used it (yet), I was just reading about it when I saw yur question:)
Steve Haigh
@Steve Haigh: Note on your link its broken, apparently wiki doesn't consider an article on Arachnode.net to be "notable" lol. guess SO doesn't count.
Anonymous Type
thanks, fixed now:)
Steve Haigh
+1  A: 

Hey:

I am the author of AN.

AN indexes all languages by default. Nothing to configure.

  • Mike
arachnode.net
http://arachnode.net/
Anonymous Type