views:

73

answers:

4

What is the most optimal way to find repetition in a infinite sequence of integers?

i.e. if in the infinite sequence the number '5' appears twice then we will return 'false' the first time and 'true' the second time.

In the end what we need is a function that returns 'true' if the integer appeared before and 'false' if the function received the integer the first time.

If there are two solutions, one is space-wise and the second is time-wise, then mention both. I will write my solution in the answers, but I don't think it is the optimal one.

edit: Please don't assume the trivial cases (i.e. no repetitions, a constantly rising sequence). What interests me is how to reduce the space complexity of the non-trivial case (random numbers with repetitions).

+1  A: 

I'd use the following approach:

Use a hash table as your datastructure. For every number read, store it in your datastructure. If it's already stored before you found a repetition.

If n is the number of elements in the sequence from start to the repetition, then this only requires O(n) time and space. Time complexity is optimal, as you need to at least read the input sequence's elements up to the repetition point.

How long of a sequence are we talking (before the repetition occurs)? Is a repetition even guaranteed at all? For extreme cases the space complexity might become problematic. But to improve it you will probably need to know more structural information on your sequence.

Update: If the sequence is as you say very long with seldom repetitions and you have to cut down on the space requirement, then you might (given sufficient structural information on the sequence) be able to cut down the space cost.

As an example: let's say you know that your infinite sequence has a general tendency to return numbers that fit within the current range of witnessed min-max numbers. Then you will eventually have whole intervals that have already been contained in the sequence. In that case you can save space by storing such intervals instead of all the elements contained within it.

Frank
Regarding the last paragraph, this is more of a thought experiment than a real problem that I have, so I don't have a practical answer.But what really interests me is what people do in the more problematic case - a really infinite sequence (let's say you leave the system on for a very long time) with seldom repetitions.
maayank
Excellent addition. thanks!
maayank
A: 

Return TRUE

If the sequence is infinite then there will be repetition of every conceivable pattern.

If what you want to know is the first place in the sequence when there is a repeated digit that's another matter, but there's some difference between your question and your example.

High Performance Mark
i[n]=n;no repetitions.
SF.
Even with a finite set of numbers the only thing known is that at least one has duplicates for an infinite sequence.
Thomas Jung
Well, let's say that the first time we meet a new number then the system needs to do something...
maayank
@SF good point, I meant that the digits would repeat with any pattern you care to mention. Actually, what I really meant, of course, is that OP's question is vague about what the desired function is to return.
High Performance Mark
A: 

Well, it seems obvious that in any solution we'll need to save the numbers that already appeared, so space wise we will always have a worst-case of O(N) where N<=possible numbers with the word size of our number type (i.e. 2^32 for C# int) - this is problematic over a long time if the sequence is really infinite/rarely repeats itself.

For saving the numbers that already appeared I would use an hash table and then check it each time I receive a new number.

maayank
2^32 is only 512Mb in a BitSet. :-)
Thomas Jung
The hash table has to much overhead in any case. Arrays for bucket arrays and entry objects. This adds up.
Thomas Jung
I do not consider it obvious that saving n numbers requires O(n) space. This heavily depends on the structure of your sequence. If the sequence is given by s[i] = i, i.e. numbers are steadily increased by one, then you can store all appeared numbers in constant space by just storing the largest witnessed number.
Frank
Thomas, could you elaborate more on the alternatives?Frank, you are right, but is it correct to say that in the general case (let's say receiving random integers) it MUST be O(N)?
maayank
+1  A: 

A BitSet for int values (2^32 numbers) would consume 512Mb. This may be ok if the BitSets are allocated not to often, fast enough and the mem is available.

An alternative are compressed BitSets that work best for sparse BitSets.

Thomas Jung