views:

30

answers:

1

I'd like to get a probability bound on the maximum number of collisions in a hash table. I.e., given number of buckets m, number of items n, what is:

Prob(number of collisions < c)

as a function of n, m and c, where c is some number between 1 and n. For example, c=1 would be the birthday paradox (~e^(-n^2/2m)) and c=n would be 1.

(I am assuming a uniform hash function, i.e., n items are distributed over m buckets uniformly at random.)

A: 

Hrm, can this be answered in the way you've asked? I don't think so. That's because the question you're asking is wrong. So much depends on the hashing function chosen. For example:

template<class>
hash_code<T>(const T &_input){ return 1; }

The probability of a collision is 100%. Doesn't depend on m, n, or c.

Edit:

Since you're assuming a uniform type distribution then to arrive at the solution transpose the question into something similar and you'll see that it's basic probability. Here it's choice by replacement. That is, given n tries, what is the probability that you will chose the same item from m items at least c times. Then the final answer is one minus that probability.

This sounds like homework so I'll let you Google it. Hope this helps.

wheaties
I think hes assuming that you have a hash function where all buckets are equally likely.
mathepic
Yes. Updated question to reflect that.
He just updated the question a minute ago to reflect this.
mathepic
That would be the number of collisions on one bucket. I want to count possible collisions across all buckets. (BTW, this is not homework although it kind of sounds like one!)