views:

84

answers:

4

In the book Introduction to the Design & Analysis of Algorithms, the following solution is proposed to the element uniqueness problem:

ALGORITHM UniqueElements(A[0 .. n-1])
// Determines whether all the elements in a given array are distinct
// Input: An array A[0 .. n-1]
// Output: Returns "true" if all the elements in A are distinct
//         and false otherwise.
for i := 0 to n - 2 do
   for j := i + 1 to n - 1 do
      if A[i] = A[j] return false
return true

How can I compute the average cost (i.e. number of comparisons for a given n) for this algorithm? What is a reasonable assumption about the input?

+1  A: 

If you don't know anything else about the input, then a reasonable assumption is that it's random. If so, and if the space of possible choices is large (e.g. the set of all real numbers), then the likelihood of two elements being the same is vanishingly small. (Mathematically, we say that the event of two randomly selected real numbers being distinct is almost sure.)

That means that your average case is equal to your worst case: you'll have to scan every element in the array to be sure that each one is distinct. Then the number of comparisons is n * (n - 1) / 2, or the sum of 1 ... n.

John Feminella
Actually, for array entries drawn from any probability distribution X over the integers, the expected running time will be O(1), since there exists some integer with positive probability.
@algorithmist: Whoops, sorry; I meant the reals, not integers.
John Feminella
@algorithmist: O(1) doesn't make any sense. Even if all the numbers in the array are identical, the algorithm will still do O(n) comparisons.
Joren
@Joren: no it won't. Read the code carefully.
IVlad
@algorithmist: What are you talking about? Are you saying the given algorithm given in question will take O(1) on average for 'random' integers?
Moron
@Feminella: I'm not sure if considering an infinitely large set is a good idea here. Suppose you did that to compute the average cost of the sequential search algorithm for an unordered array. You would end up with a cost of c(n) = n-1, which is the same of the worst case cost. A better approach would be to consider a probability 'p' of having a successful search and to assume that the ocurrence of the matching element at a position 'i' in the array has an uniform distribution. That way, you would get the following average cost: c(n) = p*(n+1)/2 + p(n-1).
Alceu Costa
My bad, it loops the "wrong" way. If positive integer n has probability 6/(pi n)^2, then the running time is omega(1).
For any fixed distribution, it will still be O(n) though, contrary to the intentions of the original post.
@IVlad: You're right, I hadn't properly read the code. Sorry about that.
Joren
@Alceu: I think you mean `(1-p)*(n * (n+1)/2)`, not `p`. Also, my point is that `p` is effectively zero for large sets, so the worst case is the same as the average case. This is a pretty intuitive result: if you're trying to determine uniqueness and you're choosing a few items from an enormous pool, it's overwhelmingly likely that they'll all be distinct choices.
John Feminella
A: 

Since you iterate twice over the array in a nested way, worst case cost should be O(n²)..

a closer look would show you that since you start second loop from the element after the one you are checking you have:

N-1 + (N-2) + (N-3) + (N-4) + (N-5) + .... + 1

comparisons so the exact average cost would be N*(N-1) / 2

According to your comment I think that you should assume that every element is uniformely chosen between the set of possible values.

This means that the element A[i] has the probability 1/n of being exactly a specified value. Starting from here you can do your considerations:

  • first of all you choose a whatever element of the array A[i]. What is the probability of having A[i] == A[i+1]? It's 1/n² since both elements are supposed to be random.
  • what is the probability of having A[i] == A[i+2]? You have 1/n * (n-1/n) * 1/n because you have respectively a specified element, anything except the specified one, and the same specified element
  • you can extend the argumentation over any element A[k] with k>i, then you add all probabilities and you will have which is the average probability of having two unique element in the array starting from a specified one.
  • you extend thing thing further considering that you can start from any A[i] with i = 0..l-1. Of course every different i will have different probabilities because array will be shorter as i increases.

NOTE: n is the number of different items that can be inserted into the array, not its length.

After this you can easily estimate your average comparison cost..

Jack
What you said would be right if you were computing the worst case. However, what I am looking for is the average case, which is a little more complicated since you have to make assumptions about the input, like, for example, the probability function of duplicated elements in the sequence.
Alceu Costa
A: 

If you need an exact value for a given input length then this will work (thought it is overkill):

ALGORITHM complexity_counter_of_UniqueElements(A[0 .. n-1]) 
// Determines whether all the elements in a given array are distinct 
// Input: An array A[0 .. n-1] 
// Output: Returns "true" if all the elements in A are distinct 
//         and false otherwise. 
counter acc = 0;
for i := 0 to n - 2 do 
   for j := i + 1 to n - 1 do 
      //if A[i] = A[j] return false 
      acc := 1 + acc
return acc

It is easy to see that this algorithm is O(n*n) though, which is probably what you're interested in. The algorithm compares every element by every other element. If you created a table with the results of this the table would have to be at least ((n*n)/2) to hold all of the results.

edit: I see now what you were really asking.

You need to compute the probability that each comparison may result in a match. This depends on the size of your elements (things that live in A) and what kind of distribution they have.

Assuming a random distribution the chance that any two random A[x] == A[y] where x != y would be 1.0/(number of possible values of element).

P(n)
total_chance := 0.0
for i:= 0 to n - 2 do
   for j := i + 1 to n - 1 do
      this_chance := 1.0/(number_of_possible_values_of_element)
      total_chance :=  total_chance + ((1-total_chance)*this_chance)
      // This should be the the probability of the newly compared pair being equal weighted
      // to account for the chance that it actually mattered (ie, hadn't found a match earlier)
return total_chance

O((1-P(n))*n*n), but P(n) is <= 1, so it is less than n*n

nategoose
I agree with you that it is easy to see that the algorithm is O(n^2). However, what I need is to find the exact average cost. For example, the worst case cost that I have found is: c(n) = n*(n-1)/2 which is also O(n^2).
Alceu Costa
I deleted this comment after rereading the algorithm.The chance that you can break out early, which is where the average != best != worst case comes in is dependant on the size of your symbols (how many different values can A[0] possibly have) as that determins the probability that any A[x] has any particular value.
nategoose
A: 

I think it's hard to talk about an average cost. The worst case cost is O(n2) and happens either when the repeated elements are towards the end of the array, for example something like this:

2 3 4 5 ... 1 1

Or when the array contains nothing but distinct elements.

The best case is when the array starts with two repeated elements, like this:

1 1 ...

In which case the cost is a single comparison. Another good case is when there exists an element near the beginning of the array that repeats at the end of the array, something like this:

2 3 4 1 ... 1

This will be (closer to) O(n).

The fact is the cost depends on the input, so you might as well assume you're going to always hit a worst case and try to find a better algorithm, maybe something based on sorting the array or on using hash tables, giving you O(nlog n) worst case and O(n) average case respectively.

IVlad
@|Vlad: Given a probability distribution on the input, you can always compute the expected number of comparisons, and call that the average cost. Since this is homework, I think this is just an exercise in doing such computations, and suggesting other methods (like sorting) might not be useful.
Moron
@Moron: true, but I don't think suggesting other methods is not useful. Knowing better ways to do things is always useful. In any case, someone already answered the average case question, so I thought I could contribute by suggesting more efficient methods.
IVlad
@IVlad: Giving _any_ information might be useful! But the point is, as asked, that extra piece of information is (probably) not useful to solve that particular question. Even more so for homework problems: adding extra information will probably just confuse the student even more. If this was a dicussion forum, I agree with you about alternate methods etc.
Moron