ansaurus

Question

How to find repeating sequence of characters in given array

Answer 1

+8 A:

For your examples, my first approach would be to

get the first character of the array (for your last example, that would be C)
get the index of the next appearance of that character in the array (e.g. 9)
if it is found, search for the next appearance of the substring between the two appearances of the character (in this case CARPENTER)
if it is found, you're done (and the result is this substring).

Of course, this works only for a very limited subset of possible arrays, where the same word is repeated over and over again, starting from the beginning, without stray characters in between, and its first character is not repeated within the word. But all your examples fall into this category - and I prefer the simplest solution which could possibly work :-)

If the repeated word contains the first character multiple times (e.g. CACTUS), the algorithm can be extended to look for subsequent occurrences of that character too, not only the first one (so that it finds the whole repeated word, not only a substring of it).

Note that this extended algorithm would give a different result for your second example, namely RONRON instead of RON.

Péter Török 2010-09-09 09:47:27

+1 for a step-by-step explanation.

BoltClock 2010-09-09 09:52:17

+1 for simplicity and linear time solution. However, I understood the problem differently. I guess that the question should specify whether characters can repeat in the pattern, and whether we are looking for the largest or smallest pattern that repeats itself.

Eyal Schneider 2010-09-09 11:00:38

assuming the word never repeats the first letter is pretty much throwing up your hands and going home.

Jonathan 2010-09-09 16:39:43

@Jonathan, in case you haven't noticed, I actually describe how to deal with that case :-)

Péter Török 2010-09-09 16:58:07

Answer 2

+1 A:

Pseudocode

len = str.length
for (i in 1..len) {
   if (len%i==0) {
      if (str==str.substr(0,i).repeat(len/i)) {
         return str.substr(0,i)
      }
   }
}

Note: For brevity, I'm inventing a "repeat" method for strings, which isn't actually part of Java's string; "abc".repeat(2)="abcabc"

ammoQ 2010-09-09 09:49:55

Answer 3

+3 A:

In Python, you can leverage regexes thus:

def recurrence(text):
    import re
    for i in range(1, len(text)/2 + 1):
        m = re.match(r'^(.{%d})\1+$'%i, text)
        if m: return m.group(1)

recurrence('abcabc') # Returns 'abc'

I'm not sure how this would translate to Java or C. (That's one of the reasons I like Python, I guess. :-)

Marcelo Cantos 2010-09-09 09:51:08

Answer 4

+1 A:

Using C++:

//Splits the string into the fragments of given size
//Returns the set of of splitted strings avaialble
set<string> split(string s, int frag)
{
    set<string> uni;
    int len = s.length();
    for(int i = 0; i < len; i+= frag)
    {
        uni.insert(s.substr(i, frag));
    }

    return uni;
}

int main()
{

    string out;
    string s = "carpentercarpenter";
    int len = s.length();

      //Optimistic approach..hope there are only 2 repeated strings
      //If that fails, then try to break the strings with lesser number of
      //characters
    for(int i = len/2; i>1;--i)
    {
        set<string> uni = split(s,i);
        if(uni.size() == 1)
        {
            out = *uni.begin();
            break;
        }
    }

    cout<<out;
    return 0;

}

Asha 2010-09-09 09:56:41

Answer 5

+1 A:

First write a method that find repeating substring sub in the container string as below.

boolean findSubRepeating(String sub, String container);

Now keep calling this method with increasing substring in the container, first try 1 character substring, then 2 characters, etc going upto container.length/2.

fastcodejava 2010-09-09 09:58:57

Answer 6

+1 A:

The first idea that comes to my mind is trying all repeating sequences of lengths that divide length(S) = N. There is a maximum of N/2 such lengths, so this results in a O(N^2) algorithm.

But i'm sure it can be improved...

Eyal Schneider 2010-09-09 10:22:00

Answer 7

+4 A:

Tongue-in-cheek O(NlogN) solution

Perform an FFT on your string (treating characters as numeric values). Every peak in the resulting graph corresponds to a substring periodicity.

Oli Charlesworth 2010-09-09 13:02:14

+1 nice. thinking outside the box.

Jonathan 2010-09-09 16:41:16

As you mention the FFT, it got me thinking about using a cross-correlation (whichever method is used) to find matches of the substring in the sequence. Normally my caveman brute-force approach, if I couldn't use an off-the-shelf regex library, would be the "walk the sequence, try to match" approach. But your answer got me thinking -- I wonder if/when a cross correlation would be more efficient. Probably depends on the length of the pattern, the length of the sequence to search, etc... but anyway, your answer got me thinking ("out of the box" as Jonathan said). Thanks.

Dan 2010-09-11 17:45:19

Cross-correlation and doing a Fourier transform are effectively the same thing (see http://en.wikipedia.org/wiki/Convolution_theorem). For anything other than quite small values of *N*, the FFT will be more efficient.

Oli Charlesworth 2010-09-11 17:51:48

Answer 8

A:

and here is a concrete working example:

/* find greatest repeated substring */
char *fgrs(const char *s,size_t *l)
{
  char *r=0,*a=s;
  *l=0;
  while( *a )
  {
    char *e=strrchr(a+1,*a);
    if( !e )
      break;
    do {
      size_t t=1;
      for(;&a[t]!=e && a[t]==e[t];++t);
      if( t>*l )
        *l=t,r=a;
      while( --e!=a && *e!=*a );
    } while( e!=a && *e==*a );
    ++a;
  }
  return r;
}

  size_t t;
  const char *p;
  p=fgrs("BARBARABARBARABARBARA",&t);
  while( t-- ) putchar(*p++);
  p=fgrs("0123456789",&t);
  while( t-- ) putchar(*p++);
  p=fgrs("1111",&t);
  while( t-- ) putchar(*p++);
  p=fgrs("11111",&t);
  while( t-- ) putchar(*p++);

2010-09-09 16:20:16

oops `p=fgrs("BARBARABARBARAB-RBARA", `

pmg 2010-09-09 17:44:22

Answer 9

A:

I'd convert the array to a String object and use regex

manolowar 2010-09-16 14:20:07

Answer 10

A:

Not sure how you define "efficiently". For easy/fast implementation you could do this in Java:

    private static String findSequence(String text) {
        Pattern pattern = Pattern.compile("(.+?)\\1+");
        Matcher matcher = pattern.matcher(text);
        return matcher.matches() ? matcher.group(1) : null;
    }

it tries to find the shortest string (.+?) that must be repeated at least once (\1+) to match the entire input text.

Carlos Heuberger 2010-09-16 15:16:40

ansaurus

tags:

views:

answers:

How to find repeating sequence of characters in given array

related questions