First, I believe that your code has a bug in it. The last line should be
return p;
. I beleve that i holds the index of the lexicographically smallest cyclic shift, and p holds the smallest shift that matches. I also think that your stopping condition is too weak, i.e. you are doing too much checking after you have found a match, but I am not sure exactly what it should be.
Note that i and j only advance and that i is always less than j. We are looking for a string that matches the string starting at i, and we are trying to match it with a string that starts at j. We do this by comparing the k'th character of each string while increasing k (as long as they match). Note that we only change i if we determine that the string starting at j is lexicographically less than the string starting at j, and then we set i to j and reset k and p to their initial values.
I do not have time for a detailed analysis, but it looks like
- i = the start of the lexicographic smallest cyclic shift
- j = the start of the cyclic shift we are matching against the shift starting at i
- k = the character in strings i and j currently under consideration (the strings match in positions 1 to k-1
- p = the cyclic shift under consideration (i believe p stands for prefix)
Edit Going further
this section of code
if ((a=x[(i+k-1)%l])>(b=x[(j+k-1)%l])) {
i=j++;
k=p=1;
Moves the start of the comparison to a lexicographically earlier string when we find one and reinitializes everything else.
this section
} else if (a<b) {
j+=k;
k=1;
p=j-i;
is the tricky part. We have found a mismatch that is lexicographically later than our reference string, so we skip to the end of the text matched so far, and start matching from there. We also increase p (our stride). Why can we skip over all the starting points between j and j + k? This is because the string starting with i is the lexicographically smallest seen, and if the tail of the current j string is greater then the string at i then any suffix of the string at j will be greater than the string at i.
Finally
} else if (a==b && k!=p) {
k++;
} else {
j+=p;
k=1;
this just checks that the string of length p starting at i repeats.
*further edit
We do this by incrementing k until k == p
, checking that the k'th character of the string starting at i equals the k'th character of the string starting at j. Once k reaches p we start scanning again at the next supposed occurrence of the string.
Even further edit to attempt to answer jethro's questions.
First: the k != p
in else if (a==b && k!=p)
Here we have a match in that the k'th and all previous characters in the strings starting at i and j are equal. The variable p represents the length that we think that the repeating string is. When k != p
, actually k < p
, so we are ensuring that the p characters at the string beginning at i are the same as the p characters of the string beginning at j. When k == p
(the final else) we should be at a point where the string starting at j + k
looks the same as the string starting at j, so we increase j by p and set k back to 1 and go back to comparing the two strings.
Second: Yes, I believe you are correct, it should return i. I was misunderstanding the meaning of "Minimum Cyclic Shift"