Searcharoo.NET contains a spider that crawls and indexes content, and a search engine to use it. You should be able to find your way around the Searcharoo.Indexer.EXE code to trap the content as it's downloaded, and add your own custom code from there...
It's very basic (all the source code is included, and is explained in six CodeProject articles, the most recent of which is here Searcharoo v6): the spider follows links, imagemaps, images, obeys ROBOTS directives, parses some non-HTML file types. It is intended for single websites (not the entire web).
Nutch/Lucene is almost certainly a more robust/commercial-grade solution - but I have not looked at their code. Not sure what you are wanting to accomplish, but have you also seen Microsoft Search Server Express?
Disclaimer: I am the author of Searcharoo; just offering it here as an option.