tags:

views:

70

answers:

2

Actually, this is not specific to SQL, and I doubt "conversation pattern" is the correct name, but I could not think of a better caption.

To simplify, imagine you got a vast stream of ints. The task is to detect an A.{1;max_n}A pattern: An int satisfies the pattern if it is followed by n (> 0) other ints, then the original int again, while n <= max_n.

Example:

...
1
4 <--
7 \
3  > n = 3
3 /
4 <--
2
...

Here, the int 4 is repeated with 3 arbitrary ints in-between, so for max_n <= 3 the pattern is satisfied for the value 4.

The question is, how can I detect which integers in the huge dump of data follow this pattern? I'm mostly interested in the algorithm itself, but an example in SQL or C# would be welcome, too.

The naive idea I came up with is to first gather a list or all distinct ints, then check the pattern in a straightforward way for each of them, but that would lead to a performance bottleneck.

+1  A: 

You can hold some dictionary (C#) or map (C++) structure where will be saved position of first occurrence of numbers.

Then for every number you should check if it present in map. If yes - you should compare position difference with maximum position difference occurred before. Otherwise you should save the number and its position in map.

DixonD
Your idea might be the simplest way and still quite effective. Sounds about O(n * log n) to me.
mafutrct
If used data structure will be implemented via hashes then you can achieve O(n) complexity
DixonD
Indeed, my mistake, with a reasonable hash function it would be `O(n)`, that's great.
mafutrct
A: 

Well, SQL will not do it in an optimal way, but it might not be horrible if the columns are indexed.

First, to speak of order in SQL you should have another column. If that column is actually equal to a row number then you can:

SELECT DISTINCT
  t1.number
FROM 
  table t1, table t2
WHERE
  (t1.rownumber-t2.rownumber) <= @max_n AND
  (t1.rownumber-t2.rownumber) >=1 AND
  t1.number = t2.number AND
Unreason