tags:

views:

173

answers:

2

I am searching for several thousand strings in a large directory tree which contains several thousand files. Each string can appear in many different files. What is the most performant way to perform this search in c#? I tried proccessinfo start with findstr (but it is painfully slow, because it opens every single file several thousand times). Any suggestions?

+2  A: 

I suggest creating a widget that indexes your file tree using Lucene.NET. Once the documents are indexed you can then use all of Lucene's power to search through the content in a very powerful way...without having to open each file 1000's of time! :P

Not sure about the life of the program...this may not be a good idea for a one time use scenario. And for a multi-use scenario you will need to make sure that you have a windows service that updates your index as the files change over time (if that is important).

This will be very performant once the indexes are created!

Andrew Siemer
A: 

Do you need to perform a one-time search or continually on demand? I would suggest either tying into the Indexing service or implement your own Lucene indexing. There are a quite a few open-source implementations of the Lucene indexing, where basically you scan your files once and build a comprehensive index of the contents and then future searches are made against the premade index. The index generation takes a while, but the searches are very fast. This works well for 'web' type content and simply phrases and words.

If you're trying to find non-word/arbitrary random strings, then you've got a different task.

-Jeff

Jeff