views:

147

answers:

3

Let's take a concrete example and hope I can be clear. Suppose the (ordered) list of months:

January < February < March < ... < December

(with integers that stand for the months, zero-based), such that

Jan is 0, Feb is 1, ..., Dec is 11.

Now suppose I do not have access to the full names of months, and am given the following list, where months have been shortened to their first letter, and e stands for an empty category, like this:

e, F, e, e, e

If I build a list of "unambiguous months" (f:1, s:8, o:9, n:10, d:11), I can fill the empty categories by by first calculating the first category (using subtraction and mod 12), and then write the rest from there. However, suppose I am given the list

e, A, e, e, J, e

Then I can (intuitively) calculate that although A is ambiguous (could be April or August), in this context it can only be April, since August does not have any Js following it after 2 categories. Once I find this, I can again calculate everything from the start.

My question, finally, is: is there an analytic solution (function, algorithm) for this problem, or is my only hope to use brute force to define each potential relation? For some examples, no disambiguation algorithm/function can work: consider the case where I have a J followed by 11 e's, followed by a J followed by 11 e's ... Since there is a year in between, I cannot disambiguate J into January, June or July.

Answer: I ended up coding Il-Bhima's answer, because for this case in particular, regex's are ok, even at a higher running time O(mn). However, I accepted Ben's answer as the correct answer because it subsumes the others (mentions the regex solution), but also suggests a better way by using the KMP algorithm O(m+n), although this is for larger numbers of the string against to match the pattern. Thanks, everybody.

+5  A: 

I'm not sure if this is exactly what you're looking for, but you could use a modified KMP string search algorithm to solve this problem.

The modification would be to match anything against the empty category. It could even find all possible values for you, such as the J with 11 e's like you mentioned.

You could also use a finite state machine to determine possibilities, this is what a regular expression would do.

Ben S
Thanks, Ben. Could you please expand on how you see KMP would find the values for my "J"s example? Thanks!
Dervin Thunk
+4  A: 

The easiest way of doing this is using regular expressions. Suppose you want to match e, A, e, e, J, e.

Construct the following regular expression: r = ".A..J."

Let c be our control string:

  c = "JFMAMJJASONDJFMAMJJASOND"

Now we search for all matches of r in the string c where the starting index of a match is within the first half of c.

In general, this may not be the most efficient method. The most naive solution, attempting to match the pattern with every cyclic shift of the control string "JFMAMJJASOND" runs in O(nm) time, where n is the length of the pattern, m is the length of the control string (in our case JFMAMJJASOND).

Il-Bhima
Cute use of regex! :)
j_random_hacker
+2  A: 

We can build on Il-Bhima's answer a little for the general case. First, we recognize that the only truly ambiguous pairs of A, M, or J are two J's that are six months apart or two of the same letter that are a year apart. Any other combination will yield an unambiguous match in the control string. (I built a table of all possible combinations to prove that.)

So all you need out of your entire starting list is two months whose distance mod 12 isn't 0 or 6. You can then build a small regular expression to match against the control string. Alternatively, you could build a lookup table containing the ordered pair and the distances between the months.

Kristo