Let's break the proof down.
Setup
First, some background.
With a hash table, we define a probe sequence P
. For any item q
, following P
will eventually lead to the right item in the hash table. The probe sequence is just a series of functions {h_0, ..., h_M-1}
where h_i
is a hash function.
To insert an item q
into the table, we look at h_0(q)
, h_1(q)
, and so on, until we find an empty spot. To find q
later, we examine the same sequence of locations.
In general, the probe sequence is of the form h_i(q) = [h(q) + c(i)] mod M
, for a hash table of size M
, where M
is a prime number. The function c(i)
is the collision-resolution strategy, which must have two properties:
First, c(0) = 0
. This means that the first probe in the sequence must be equal to just performing the hash.
Second, the values {c(0) mod M, ..., c(M-1) mod M}
must contain every integer between 0 and M-1. This means that if you keep trying to find empty spots, the probe sequence will eventually probe every array position.
Applying quadratic probing
Okay, we've got the setup of how the hash table works. Let's look at quadratic probing. This just means that for our c(i)
we're using a general quadratic equation of the form ai^2 + bi + c
, though for most implementations you'll usually just see c(i) = i^2
(that is, b, c = 0
).
Does quadratic probing meet the two properties we talked about before? Well, it's certainly true that c(0) = 0
here, since (0)^2
is indeed 0
, so it meets the first property. What about the second property?
It turns out that in general, the answer is no.
Theorem. When quadratic probing is used in a hash table of size M
, where M
is a prime number, only the first floor[M/2]
probes in the probe sequence are distinct.
Let's see why this is the case, using a proof by contradiction.
Say that the theorem is wrong. Then that means there are two values a
and b
such that 0 <= a < b < floor[M/2]
that probe the same position.
h_a(q)
and h_b(q)
must probe the same position, by (1), so h_a(q) = h_b(q)
.
h_a(q) = h_b(q) ==> h(q) + c(a) = h(q) + c(b)
, mod M
.
The h(q)
on both sides cancel. Our c(i)
is just c(i) = i^2
, so we have a^2 = b^2
.
Solving the quadratic equation in (4) gives us a^2 - b^2 = 0
, mod M. This is a difference of two squares, so the solution is (a - b)(a + b) = 0
, mod M
.
But remember, we said M
was a prime number. The only way that (a - b)(a + b)
can be zero mod M
is if [case I] (a - b) is zero, or [case II] (a + b) is zero mod M
.
Case I can't be right, because we said that a != b
, so a - b
must be something other than zero.
The only way for (a + b)
to be zero mod M
is for a + b
to be equal to be a multiple of M
or zero. They clearly can't be zero, since they're both bigger than zero. And since they're both less than floor[M/2]
, their sum must be less than M
. So case II can't be right either.
Thus, if the theorem were wrong, one of two quantities must be zero, neither of which can possibly be zero -- a contradiction! QED: quadratic probing doesn't satisfy property two once your table is more than half full and if your table size is a prime number. The proof is complete!