views:

196

answers:

4

Does anyone know how text editors/programmers editors are able to do such fast searches on very large text files.

Are they indexing on load, at the start of the find or some other clever technique?

I desperately need a faster implementation of what I have which is a desperately slow walk from top to bottom of the text.

Any ideas are really appreciated.

This is for a C# implementation, but its the technique I'm interested in more than the actual code.

+5  A: 

Begin with Boyer-Moore search algorithm. It requires some preprocessing (which is fast) and does searching pretty well - especially when searching for long substrings.

Anton Gogolev
+1  A: 

I wouldn't be surprised if most just use the basic, naive search technique (scan for a match on the 1st char, then test if the hit pans out).

Michael Burr
+1  A: 
Elijah
"I'm going to beat grep by thirty percent!" http://ridiculousfish.com/blog/archives/2006/05/30/old-age-and-treachery/
jleedev
If only I could vote this comment up, that is hilarious!
Elijah
A: 

One methods I know of which is not yet mentiones , is the Knuth-Morris-Pratt-Search (KMP), but it isnt so good for language texts (its due to a prefix property of the algorithm), but for stuff like DNA matching it is very very good.

Another one is a hash-Search (I dont know if there is an offical name). First you calc a hash value of your pattern and then you make a sliding window (with the size of your pattern) and move it over your text and seeing if the hashes match. The idea here is to choose the hash in a way that you dont have to compute the hash for the complete window but you update your hash just with the next char (and the old char drops out of the hash computation). This algortihm performs very very well, when you have multiple strings to search for (because you just compute beforehand your hashes for your strings).

flolo