views:

171

answers:

2

I am reading "Probability and Computing" by M.Mitzenmacher and E.Upfal. I am having problems understanding how the probability of comparison of two elements is calculated.

Input: the sorted list (y1,y2,...,yN) of numbers. We are looking for pivot element (randomly). Question: what is probability that two elements yi and yj (j>i) will be compared?

Answer (from book): yi and yj will be compared if either yi or yj will be selected as pivot in first draw from sequence (yi,yi+1,...,yj-1,yj). So the probablity is: 2/(j-i+1).

The problem for me is the initial claim: for example, picking up yi in the first draw from the whole list will cause the comparison with yj (and vice-versa) and the probability is 2/n.

So, rather the "reverse" claim is true -- none of the (yi+1,...,yj-1) elements can be selected before yi or yj, but the "pool" size is not fixed (in first draw it is N for sure, but on the second it is smaller).

Could someone please explain how the authors come up with such a simplified conclusion?

Edit1: some good soul polished my post, thank you :-).

Edit2: the list is sorted initially.

+3  A: 

Quicksort works by comparing each element with the pivot: those larger than the pivot are placed on the right of the pivot, and those not larger on the left (or the other way around if you want a descending sort, it doesn't matter).

At each step, the pivot is chosen from a sequence (yi, yi+1, ..., yj). How many elements in this sequence? j - i + 1 (I think you had a typo, it can't be y - i + 1).

So the probability of selecting one of two specific elements from this list is obviously 2 / (j - i + 1).

The problem for me is initial claim: for example, picking up yi in the first draw from the whole list will cause the comparison with yj (and vice-versa) and the probability is 2/n.

Picking yi will cause it to be compared with the other j - i elements only. Where did you get n from? Remember that your list only goes from yi to yj!

Edit:

Reading the question again, I do find it a bit ambiguous. The probability of comparing two elements at the first step of the recursion is indeed 2 / n as you say, because the i and j are 1 and n. The probability of comparing two elements at an unknown recursive step is what I explained above.

IVlad
Thank you for your comment, I corrected the typo and added info the list -- it is sorted initially. Anyway, as I see (in your edit) you also spotted the problem with the size of the list, however the final conclusion is incorrect. You ask about element yi and yj, so you cannot even say there will be such sequence yi..yj in any further step (for example, if you pick up yi+1 as your first pivot, there won't be).
bantu
Further steps don't matter. At each step, the selected pivot reaches its final position, so it will play no role in any further steps.
IVlad
Let's say i>2. In first step you selected 1, in 2nd -- 2. Now, the probability of picking up i or j in the 3rd step is 2/(N-3+1), not 2/(j-i+1) (as you wrote in last sentence in edit). And what's more -- it is only prob. of picking up those elements, not prob. of comparing them, because other scenarios of the 3rd step also lead to comparing them. Each step determines probability, so they matter.
bantu
What I meant to say is that further steps don't matter if you pick the right pivot at the current recursive step. I get the feeling that the question asks for the probability of picking the right pivot (either of `yi` or `yj`) at exactly one given recursive step, one corresponding to the sequence `yi, yi+1, ..., yj`. Otherwise, you are right, the question is a lot harder. It IS indeed an ambiguous question.
IVlad
A: 

The answer given by the authors is correct, though I still don't see how they jumped to the conclusion that easily and fast.

Let denote by L=j-i+1. Actual values of j and i don't matter here, what matters is L. Let also denote by P(N,L) probability of comparing yi and yj element from ordered sequence of numbers of size N.

Facts:

  • P(N,2) = 1
  • P(N,L) = 2/N+1/N * ( P(N-1,L)+P(N-2,L)+P(N-3,L)+...+P(L,L) )

This sum looks ugly, but after two tests it appeared that the P(N,L) is probably equal to 2/L. Let's check it out:

  • P(N,L=2) = 1 = 2/2 = 2/L
  • let's assume P(N,L) = 2/L
  • P(N+1,L) = 2/(N+1) + 1/(N+1) * ( P(N,L) + ... P(L,L) ) = 2/(N+1) + (N-L+1)*1/(N+1)*2/L = 2/L

And since L=j-i+1, we get 2/(j-i+1).

bantu