views:

69

answers:

1

Hello, I am curious what is the most efficient algorithm (or commonly used) to count the number of occurances of a string in a chunck of text.

From what I read, Boyer–Moore string search algorithm is the standard for string search but I am not sure if counting occurance in an efficient way would be same as searching a string.

In python this is what I want:

text_chunck = "one two three four one five six one"
occurance_count(text_chunck, "one") # gives 3.

Regards

EDIT: It seems like python str.count serves me such method however I am not able to find what algorithm it uses.

A: 

Boyer-Moore would be a good choice for counting occurrences, since it has some overhead that you would only need to do once. It does better the longer the pattern string is, so for "one" it would not be a good choice.

If you want to count overlaps, start the next search one character after the previous match. If you want to ignore overlaps, start the next search the full pattern string length after the previous match.

If your language has an indexOf or strpos method for finding one string in another, you can use that. If it proves to slow, then choose a better algorithm.

drawnonward