views:

223

answers:

8

Is it possible to compute the number of different elements in an array in linear time and constant space? Let us say it's an array of long integers, and you can not allocate an array of length sizeof(long).

P.S. Not homework, just curious. I've got a book that sort of implies that it is possible.

A: 

man uniq (builtin shell)

LarsOn
A: 

I do not think this can be done in linear time. One algorithm to solve in O(n log n) requires first sorting the array (then the comparisons become trivial).

Kevin Sylvestre
A: 

Assuming we can partially destroy the input, here's an algorithm for n words of O(log n) bits.

Find the element of order sqrt(n) via linear-time selection. Partition the array using this element as a pivot (O(n)). Using brute force, count the number of different elements in the partition of length sqrt(n). (This is O(sqrt(n)^2) = O(n).) Now use an in-place radix sort on the rest, where each "digit" is log(sqrt(n)) = log(n)/2 bits and we use the first partition to store the digit counts.


If you consider streaming algorithms only ( http://en.wikipedia.org/wiki/Streaming_algorithm ), then it's impossible to get an exact answer with o(n) bits of storage via a communication complexity lower bound ( http://en.wikipedia.org/wiki/Communication_complexity ), but possible to approximate the answer using randomness and little space (Alon, Matias, and Szegedy).

+3  A: 

This is the Element uniqueness problem, for which the lower bound is Ω( n log n ), for comparison-based models. The obvious hashing or bucket sorting solution all requires linear space too, so I'm not sure this is possible.

Larry
I think the bucket solutions require constant space. Actually they require that the number of possible different values is constant. So a bucket (with a boolean) for each value is still constant space.
ziggystar
That's not a bucket, that's a hash. Yes, it is constant for some definition of constant, for example, you can allocate an array the size of the known universe every time. It does not further the conversation though. Of course, you can say to yourself, wait, there are only N possible unique values! So let's just use `N` buckets instead! Oh wait..
Larry
That's how it's done. I'm not saying this is a practical solution. It's a theoretical answer to a theoretical question.
ziggystar
I have a better solution for `O(N)` solution anyhow, "constant" space. Just precalculate the answers for all possible array size, elements. All you have to do now is look it up in the table!
Larry
Actually, that's a valid answer in the theory of computational complexity, given you're dealing with a finite number of different inputs. Btw, hashmaps are not constant in lookup time but linear or log, depending on the implementation.
ziggystar
Citation needed.
Larry
A: 

If you are guaranteed that the numbers in the array are bounded above and below, by say a and b, then you could allocate an array of size b - a, and use it to keep track of which numbers have been seen.

i.e., you would move through your input array take each number, and mark a true in your target array at that spot. You would increment a counter of distinct numbers only when you encounter a number whose position in your storage array is false.

Rob Lachlan
A: 

You can't use constant space. You can use O(number of different elements) space; that's what a HashSet does.

Rex Kerr
The asker probably wants O(log # of distinct elements) space.
@algorithmist: But that takes O(n log m) time, where m is the number of distinct elements. That's not linear.
Rex Kerr
I didn't even specify an algorithm. How do you know how fast it runs?Usually these kinds of questions are best posed in the unit-cost model, which counts basic operations on (log n)-bit words.
I can make a good guess based upon how many bits of information about the answer you can possibly accumulate per step. `n` space allows for `log n` accumulation of information per step, but you don't allow that much, so you're limited to the `n log n` family of solutions.
Rex Kerr
A: 

This can be done with a bucket approach when assuming that there are only a constant number of different values. Make a flag for each value (still constant space). Traverse the list and flag the occured values. If you happen to flag an already flagged value, you've found a duplicate. You have to traverse the buckets for each element in the list. But that's still linear time.

ziggystar
Why would you make the assumption that there are only a constant number of different values?
Larry
Assuming we are talking about arrays that contain longs (or bytes or shorts to be more realistic), you have a constant space complexity. There are only 256 different byte values, 65336 values for short, etc...
ziggystar
So, what's your array size for an array with elements of type "string", length K each? What about arrays of arbitrary objects? How about providing a feasible answer?
Larry
I was talking about objects with a fixed number of possible values. Strings are not covered by my answer.I return the question: How about an array of objects of infinite size?
ziggystar
A: 

You can use any sorting algorithm and count the number of different adjacent elements in the array.