Number of different elements in an array.

views:

223

answers:

+1 Q:

Number of different elements in an array.

Is it possible to compute the number of different elements in an array in linear time and constant space? Let us say it's an array of long integers, and you can not allocate an array of length sizeof(long).

P.S. Not homework, just curious. I've got a book that sort of implies that it is possible.

man uniq (builtin shell)

LarsOn 2010-03-23 17:19:26

I do not think this can be done in linear time. One algorithm to solve in O(n log n) requires first sorting the array (then the comparisons become trivial).

Kevin Sylvestre 2010-03-23 17:19:44

Assuming we can partially destroy the input, here's an algorithm for n words of O(log n) bits.

Find the element of order sqrt(n) via linear-time selection. Partition the array using this element as a pivot (O(n)). Using brute force, count the number of different elements in the partition of length sqrt(n). (This is O(sqrt(n)^2) = O(n).) Now use an in-place radix sort on the rest, where each "digit" is log(sqrt(n)) = log(n)/2 bits and we use the first partition to store the digit counts.

If you consider streaming algorithms only ( http://en.wikipedia.org/wiki/Streaming_algorithm ), then it's impossible to get an exact answer with o(n) bits of storage via a communication complexity lower bound ( http://en.wikipedia.org/wiki/Communication_complexity ), but possible to approximate the answer using randomness and little space (Alon, Matias, and Szegedy).

2010-03-23 17:21:18

+3 A:

This is the Element uniqueness problem, for which the lower bound is Ω( n log n ), for comparison-based models. The obvious hashing or bucket sorting solution all requires linear space too, so I'm not sure this is possible.

Larry 2010-03-23 17:22:16

I think the bucket solutions require constant space. Actually they require that the number of possible different values is constant. So a bucket (with a boolean) for each value is still constant space.

ziggystar 2010-03-23 17:30:46

That's not a bucket, that's a hash. Yes, it is constant for some definition of constant, for example, you can allocate an array the size of the known universe every time. It does not further the conversation though. Of course, you can say to yourself, wait, there are only N possible unique values! So let's just use `N` buckets instead! Oh wait..

Larry 2010-03-23 17:35:43

That's how it's done. I'm not saying this is a practical solution. It's a theoretical answer to a theoretical question.

ziggystar 2010-03-24 08:58:48

I have a better solution for `O(N)` solution anyhow, "constant" space. Just precalculate the answers for all possible array size, elements. All you have to do now is look it up in the table!

Larry 2010-03-24 12:15:04

Actually, that's a valid answer in the theory of computational complexity, given you're dealing with a finite number of different inputs. Btw, hashmaps are not constant in lookup time but linear or log, depending on the implementation.

ziggystar 2010-03-24 19:26:59

Citation needed.

Larry 2010-03-24 20:56:01

If you are guaranteed that the numbers in the array are bounded above and below, by say a and b, then you could allocate an array of size b - a, and use it to keep track of which numbers have been seen.

i.e., you would move through your input array take each number, and mark a true in your target array at that spot. You would increment a counter of distinct numbers only when you encounter a number whose position in your storage array is false.

Rob Lachlan 2010-03-23 17:22:36

You can't use constant space. You can use O(number of different elements) space; that's what a HashSet does.

Rex Kerr 2010-03-23 17:23:49

The asker probably wants O(log # of distinct elements) space.

2010-03-23 17:25:48

@algorithmist: But that takes O(n log m) time, where m is the number of distinct elements. That's not linear.

Rex Kerr 2010-03-23 17:30:28

I didn't even specify an algorithm. How do you know how fast it runs?Usually these kinds of questions are best posed in the unit-cost model, which counts basic operations on (log n)-bit words.

2010-03-23 17:37:52

I can make a good guess based upon how many bits of information about the answer you can possibly accumulate per step. `n` space allows for `log n` accumulation of information per step, but you don't allow that much, so you're limited to the `n log n` family of solutions.

Rex Kerr 2010-03-23 19:24:04

This can be done with a bucket approach when assuming that there are only a constant number of different values. Make a flag for each value (still constant space). Traverse the list and flag the occured values. If you happen to flag an already flagged value, you've found a duplicate. You have to traverse the buckets for each element in the list. But that's still linear time.

ziggystar 2010-03-23 17:34:34

Why would you make the assumption that there are only a constant number of different values?

Larry 2010-03-23 17:42:08

Assuming we are talking about arrays that contain longs (or bytes or shorts to be more realistic), you have a constant space complexity. There are only 256 different byte values, 65336 values for short, etc...

ziggystar 2010-03-24 08:57:35

So, what's your array size for an array with elements of type "string", length K each? What about arrays of arbitrary objects? How about providing a feasible answer?

Larry 2010-03-24 12:21:52

I was talking about objects with a fixed number of possible values. Strings are not covered by my answer.I return the question: How about an array of objects of infinite size?

ziggystar 2010-03-24 19:28:49

You can use any sorting algorithm and count the number of different adjacent elements in the array.

2010-03-24 09:21:17

ansaurus

tags:

views:

answers:

Number of different elements in an array.

related questions