views:

138

answers:

2

For a string of length L, I want to find the longest substring that appears n (n<L) or more times in ths string.

For example, the longest substring that occurs 2 or more times in "BANANA" is "ANA", once starting from index 1, and once again starting from index 3. The substrings are allowed to overlap.

In the string "FFFFFF", the longest string that appears 3 or more times is "FFFF".

The brute force algorithm for n=2 would be selecting all pairs of indexes in the string, then running along until the characters are different. The running-along part takes O(L) and the number of pairs is O(L^2) (duplicates are not allowed but I'm ignoring that) so the complexity of this algorithm for n=2 would be O(L^3). For greater values of n, this grows exponentially.

Is there a more efficient algorithm for this problem?

+10  A: 

Suffix trees solve these kind of problems very fast, usually O(n) time and space.

Look at the wiki page:

Suffix Trees.

and read the material (the Functionality section) on that page which links to:

Longest Repeated Substring.

Moron
Excellent answer (contrary to your user name)!
Tomislav Nakic-Alfirevic
A: 

Longest common substring problem

Hun1Ahpu