views:

82

answers:

2

I'm looking for a way to search a file system that contains approximately 1TB of documents in either Office or PDF format. Is Lucene.Net pretty much the best way to accomplish this? I've also heard of dtSearch, and was wondering if anyone had used that tool with any success? Are there any other tools out there that would do the job?

I'm looking for tools that use .Net and will work on Windows boxes.

If Lucene.Net is the best way to go - does anyone have any good tutorials that would help get me started? I've googled and most of the results that come back either do not seem like best practices or don't directly address my current situation.

If this question has already been asked I apologize and if someone would please point me to a similar post that would be great.

+4  A: 

Look into Search Server Express. It's the free version of the search incorporated into SharePoint.

Lucene/Solr is a choice, but your problem isn't the search engine per say, you need a system which can read and parse the pdf's. Lucene by itself is just an engine, but you have additions with Solr which helps you parse content.

Using Search Server should get you up an running fairly quick, and the Search API is well documented and easy to use.

Mikael Svenson
This looks pretty cool - I was not aware that MS released a product like this. Have you used it much? With good success? I assume it will index Office and PDF documents out of the box? I checked the website but it's not very informative.
Eric
SSE would have also been my suggestion - if you need to scale up/out in the future you could migrate to Search Server 2008 without much pain.Another benefit is that your search engine will be aware about ACL and what your user is allowed to find.
Filburt
Alrighty, I'm going to give SSE a try! Thanks guys!
Eric
You need to install a PDF IFilter in order to do pdf's (Check out http://forums.foxitsoftware.com/showthread.php?t=10588). I've used SSE a few times, but mostly I've used the SharePoint version (which is more or less the same). SSE is the replacement for the old Index Server which is no longer supported my MS (afaik).
Mikael Svenson
+1  A: 

I've used Everything and I like it quite a bit, its an app, but it also has an SDK for C/C#/Clarion that includes its search API.

One thing, it won't index contents of files, just file names. This makes it super fast to create the index and to access it

main page

SDK

Francisco Noriega
In the right situation this looks like it could be very nice! Unfortunately I think that the inability to index file contents may be a deal breaker for my current purpose :(
Eric