I am searching for several thousand strings in a large directory tree which contains several thousand files. Each string can appear in many different files. What is the most performant way to perform this search in c#? I tried proccessinfo start with findstr (but it is painfully slow, because it opens every single file several thousand times). Any suggestions?
I suggest creating a widget that indexes your file tree using Lucene.NET. Once the documents are indexed you can then use all of Lucene's power to search through the content in a very powerful way...without having to open each file 1000's of time! :P
Not sure about the life of the program...this may not be a good idea for a one time use scenario. And for a multi-use scenario you will need to make sure that you have a windows service that updates your index as the files change over time (if that is important).
This will be very performant once the indexes are created!
Do you need to perform a one-time search or continually on demand? I would suggest either tying into the Indexing service or implement your own Lucene indexing. There are a quite a few open-source implementations of the Lucene indexing, where basically you scan your files once and build a comprehensive index of the contents and then future searches are made against the premade index. The index generation takes a while, but the searches are very fast. This works well for 'web' type content and simply phrases and words.
If you're trying to find non-word/arbitrary random strings, then you've got a different task.
-Jeff