views:

152

answers:

4

I'm looking for a efficient searching algorithm to get the longest shortest repeated pattern in a collection (~2k of integers), where my collection is made of this repeated pattern only (there is no noise between repeated patterns), but the last occurence of pattern may be incomplete.

Examples: I've got: [2,4,1, 2,4,1, 2,4,1, 2,4,1, 2,4,1]
I'd like to recieve: [2,4,1]

I've got: [21,1,15,22, 21,1,15,22, 21,1,15,22, 21,1,15]
I'd like to recieve: [21,1,15,22]

I've got: [3,2,3,2,5]
I'd like to recieve: [] (there is no pattern)

(spaces added just for readability)

+1  A: 

Try building up your initial grouping by starting at the beginning adding a number to the group until you get to a number that is the same as the first in the group (the previous number terminates the pattern). Use this as your test pattern and go through, matching the pattern until you get a failure. If you match the entire collection (with your special end pattern handling) that is one candidate. Go back to the place where you found your initial match, then continue building up your group until you get to another number matching the first in your pattern. Repeat, replacing your candidate whenever you find a longer one. When your pattern is the same length as the collection stop (this one doesn't match). If you have a candidate that will be the longest pattern.

tvanfosson
+5  A: 

The very straight forward algorithm would look like this (in Python, but should be no problem to translate to Javascript):

def check(a, width):
  '''check if there is a repeated pattern of length |width|'''
  for j in range(width, len(a)):
    if a[j] != a[j-width]:
      return False
  return True

def repeated(a):
  '''find the shortest repeated pattern'''
  for width in range(1, len(a)):
    if check(a, width):
      return a[:width]
  return []

This should also be rather efficient, since most of the time the loop in check() will return right in the first iteration, so that you basically only iterate over the list once.

sth
hasperiod = lambda seq, period: all(seq[i] == seq[i+period] for i in xrange(len(seq) - period))`
J.F. Sebastian
A: 

I think you could approach this problem by considering the period of the pattern. The period of a sequence A[] is the smallest integer T such that A[i+T] = A[i] for all i. In your case, when you find the period T, you are done, since A[0..T-1] is the shortest pattern you are looking for. So, start with smalles possible period T=1 and test if the sequence satisfies the periodic property. If yes, you are done (this actually happens only when all elements are identical). For any larger T, you need to test whether A[i+T] = A[i] for i=0..A.len-T-1. This is just a simple loop.

Krystian
A: 

You might optimize you search by observing that your collection's length must be a multiple of your pattern length. If your collection has a size that is prime, the only possible pattern length is 1, i.e. all elements must be identical!

mfx
It would be a good idea, but as I've stated above, the last occurence of the pattern may be incomplete.
wildcard