views:

170

answers:

5

I'm not sure if there is a simple way of doing this, but is there a way to find multiple instances in an unknown string? For example:

hellohellohellobyebyebyehello

Without knowing the value of the above string, can I return something that will tell me that there are 3 instances of "hello" and 3 instances of "bye" (I'm not worried about the last hello however as I'm looking for consecutive repetition. Thanks in advance!

+7  A: 

Maybe the Sequitur algorithm can help: http://sequitur.info/

tur1ng
+1. Interesting.
RichardOD
A: 

if you are looking up for dictionary words, you can load your lexicon in a suffix tree, then consider the characters of your string one by one and go through your tree. Each time your reach a leaf you increment by one the associated "word".

PierrOz
A prefix tree is enough and it is very easy to implement even in javascript.
jkff
I removed what I said about javascript as I'm not an expert... and true, a prefix tree is enough, easier to implement but less optimized
PierrOz
+2  A: 

"testhellohellohellobyebyebyehello".match(/(.+)\1+/)

This says : "match a sequence of at least 1 character (.+), then reference that first thing we found \1 at least one time + or more.

It will return ["hellohellohello", "hello"] meaning hellohellohello matches the full expression (expression 0), and "hello" matches expression 1 (the thing we reference with \1).

Caveat:
something like "hahahaha" will yield ["hahahaha", "haha"], instead of ["hahahaha", "ha"]. so you'll need to use the above with some post-processing to get to your desired result.

z5h
A: 
var source = "asdhellohellohellobyehellohellohellohelloasdhello";
var key = "hello";
var len = key.length;
var res = 0, tempres, next;
var last = source.indexOf(key);
while(last != -1)
{
  tempres = 0;
  next = last;
  while(true)
  {
    tempres++;
    next += len;
    last = source.indexOf(key, next);
    if(last != next)
      break;
  }
  res = (tempres > res) ? tempres : res;
}
console.log(res);//4
Amarghosh
+4  A: 
s = "hellohellohellobyebyebyehello"
s.replace(/(.+)(\1+)/g, function($0, $1) {
    console.log($1 + " repeated " + ($0.length / $1.length) + " times");
});
stereofrog
+1 for being innovative. Make the first `+` nongreedy (`/(.+?)(\1+)/`) or it will match `hellohello` repeated 2 times instead of a `hello` repeated 4 times (if there are 4 (or more) hello's in the string)
Amarghosh