ansaurus

Question

Minimal cyclic shift algorithm explanation

Answer 1

+3 A:

First, I believe that your code has a bug in it. The last line should be return p;. I beleve that i holds the index of the lexicographically smallest cyclic shift, and p holds the smallest shift that matches. I also think that your stopping condition is too weak, i.e. you are doing too much checking after you have found a match, but I am not sure exactly what it should be.

Note that i and j only advance and that i is always less than j. We are looking for a string that matches the string starting at i, and we are trying to match it with a string that starts at j. We do this by comparing the k'th character of each string while increasing k (as long as they match). Note that we only change i if we determine that the string starting at j is lexicographically less than the string starting at j, and then we set i to j and reset k and p to their initial values.

I do not have time for a detailed analysis, but it looks like

i = the start of the lexicographic smallest cyclic shift
j = the start of the cyclic shift we are matching against the shift starting at i
k = the character in strings i and j currently under consideration (the strings match in positions 1 to k-1
p = the cyclic shift under consideration (i believe p stands for prefix)

Edit Going further

this section of code

    if ((a=x[(i+k-1)%l])>(b=x[(j+k-1)%l])) {
        i=j++;
        k=p=1;

Moves the start of the comparison to a lexicographically earlier string when we find one and reinitializes everything else.

this section

   } else if (a<b) {
        j+=k; 
        k=1; 
        p=j-i;

is the tricky part. We have found a mismatch that is lexicographically later than our reference string, so we skip to the end of the text matched so far, and start matching from there. We also increase p (our stride). Why can we skip over all the starting points between j and j + k? This is because the string starting with i is the lexicographically smallest seen, and if the tail of the current j string is greater then the string at i then any suffix of the string at j will be greater than the string at i.

Finally

    } else if (a==b && k!=p) {
        k++;
    } else {
        j+=p; 
        k=1;

this just checks that the string of length p starting at i repeats.

*further edit We do this by incrementing k until k == p, checking that the k'th character of the string starting at i equals the k'th character of the string starting at j. Once k reaches p we start scanning again at the next supposed occurrence of the string.

Even further edit to attempt to answer jethro's questions.

First: the k != p in else if (a==b && k!=p) Here we have a match in that the k'th and all previous characters in the strings starting at i and j are equal. The variable p represents the length that we think that the repeating string is. When k != p, actually k < p, so we are ensuring that the p characters at the string beginning at i are the same as the p characters of the string beginning at j. When k == p (the final else) we should be at a point where the string starting at j + k looks the same as the string starting at j, so we increase j by p and set k back to 1 and go back to comparing the two strings.

Second: Yes, I believe you are correct, it should return i. I was misunderstanding the meaning of "Minimum Cyclic Shift"

deinst 2010-08-11 16:11:41

+1: Seems helpful :-) Perhaps you should also mention the cases when k is > 1 (substrings at i and j match exactly at the first k positions I believe).

Moron 2010-08-11 16:51:59

I figured that that was unneccessary, but hey, what the heck, I need to learn to be clearer.

deinst 2010-08-11 16:59:32

@deinst. Thanks a lot for your time and explantation.Why do you think last statement should be return p? We looking for cyclic shift so IMHO i is correct. I still don't understand what we use p for ( for instance why we check in the last else if (k!=p)?

jethro 2010-08-12 21:28:39

No, i is the start of the cycle, p is the stride. run the algorithm on 'bacdbacd'. IIRC at the end i will be 1 (pointing at the first a) and p will be 4 (the number of chars to rotate to get a match).

deinst 2010-08-12 21:45:14

@Jethro I've added an update. Yes, I believe that you are correct and it should return i. I was misunderstanding the meaning of Minimum cyclic shift.

deinst 2010-08-12 23:55:52

ansaurus

tags:

views:

answers:

Minimal cyclic shift algorithm explanation

related questions