I have a text file containing 21000 strings (one line each) and 500 MB of other text files (maily source codes). For each string I need to determine if it is contained in any of those files. I wrote program that does the job but its performance is terrible (it would do that in couple of days, I need to have the job done in 5-6 hours max).
I'm writing using C#, Visual Studio 2010
I have couple of questions regarding my problem:
a) Which approach is better?
foreach(string s in StringsToSearch)
{
//scan all files and break when string is found
}
or
foreach(string f in Files)
{
//search that file for each string that is not already found
}
b) Is it better to scan one file line by line
StreamReader r = new StreamReader(file);
while(!r.EndOfStream)
{
string s = r.ReadLine();
//... if(s.Contains(xxx));
}
or
StreamReader r = new StreamReader(file);
string s = r.ReadToEnd();
//if(s.Contains(xxx));
c) Would threading improve performance and how to do that?
d) Is there any software that can do that so I don't have to write my own code?