What's the fastest way to find strings in text files ? Case scenario : Looking for a particular path in a text file with around 50000 file paths listed (each path has it's own line).
A file of that size should easily fit in memory and you can make it into a std::set (or even better a hashset, if you have a library of that at hand) with the paths as its items. Checking if an exact path is there will then be very fast.
If you need to look for sub-paths as well, a sorted std::vector (if you're looking for prefixes only) may be the only useful approach -- or if you're looking for completely general substrings of paths then you'll need to scan through all the vector anyway, but unless you have to do it a zillion times even that wouldn't be too bad.
This is the very field for regular expressions; you should look into grep and awk.
Do you have to find one string once in the file, the same string repeatitly in several files, several strings in the same file?
Depending on the scenario, you have several possible answers.
building a data stucture (like the set proposed by Alex) is usefull if you have to find several strings in the same file
using an algorithm like Boyer-Moore is efficient if you have to search for one string
using a regular expression engine will probably be preferable if you have to search for several strings.
I am not sure the extent you would like to use search, but FSM are good options to use.
Here is the discussion: http://stackoverflow.com/questions/525004/short-example-of-regular-expression-converted-to-a-state-machine