Given a string S, what is the best algorithm to find a substring which repeats maximum number of times.
For example, in "assdssfssd", it is "ss" which repeats maximum number of times.
Given a string S, what is the best algorithm to find a substring which repeats maximum number of times.
For example, in "assdssfssd", it is "ss" which repeats maximum number of times.
I can see building a tree to solve that particular problem.
There is a notional root node. The first character is the first child. The second character is a child of the first character a -> s in your case. It also begins a new leaf of the root node. If, in adding a node, you visit an existing node, you increment its count (initial value 1).
Once done, you visit every node of the tree to find the one with the highest count at the deepest level (because if "asdf" occurs 5 times then "a", "as" and "asd" occur a minimum of 5 times, by definition).
The substring that repeats the most will be a single letter, so you'll find the letter that occurs the most. This is quite easy:
>>> str = 'Can Berk Güder'
>>> letters = [l for l in str]
>>> uniq_letters = set(letters)
>>> counts = [(letters.count(l), l) for l in uniq_letters]
>>> counts
[(1, 'B'), (1, 'C'), (1, 'G'), (1, 'a'), (1, 'd'), (1, 'k'), (1, 'n'), (1, 'ü'), (2, ' '), (2, 'e'), (2, 'r')]
// C# code, finds one of the most occurred non-empty substrings in O(n), proof by the reader!
int[] x = new int[65536];
foreach (char c in myString)
x[(int)c]++;
int max = 0;
for (int i = 0; i < x.Length; ++i)
if (x[max] < x[i])
max = i;
return ((char)max).ToString();
This probably isn't what you want, though. You might need to look at something like Huffman coding...
Sounds like you are looking for something close to a compression algorithm. Compression works by finding redundant (repeated) information, and replacing it with a pointer to the first occurrence. Here are a few samples of code to do so:
http://www.developerfusion.com/code/1642/string-compression/
http://www.howtodothings.com/computers/a1223-simple-string-compression.html
In a "N" lenghted Strings,
No Of "1" character will be "N" which requires comparision of N * (N-1) / 2
No of "2" characters will be "N-1" which requires comparision of (N-1) * (N-2) / 2
No of "3" characters will be "N-2" which requires comparision of (N-2) * (N-3) / 2
.............
and No of "N" characters will be "1" which requires compariosn of (1 * 0 / 2)
Hence No Of Max Substrings = "N" + "N-1" + .... "1" = (N * ( N+1) / 2 ) and comparions required is (N+1) * (N) * (N-1) / 6
If you do a Bucket placing (not sorting) at each no of characters to the same size, then
No Of "1" character will be "N" which requires comparision of N -1 with buckets of N
No of "2" characters will be "N-1" which requires comparision of (N-2) with Buckets of N-1
No of "3" characters will be "N-2" which requires comparision of (N-3) with Buckets of N-2
.............
and No of "N" characters will be "1" which requires compariosn of 0 with Bucket of 1
Here it reduces the total comparisions to "N * (N-1) / 2"
Finally after doing bucket placing, take the bucket which has highest nos for your answer.