Do you really need to precompute all the possible pairs? What if you were to do it lazily, i.e. on an on-demand basis?
This can be represented as a 2D matrix. The rows correspond to the customers and the columns correspond to the products.
Each entry is either 0 or 1, saying whether the product corresponding to the column was bought by the customer corresponding to the row.
If you look at each column as a vector of (around 5000) 0s and 1s, then the number of times two products were bought together is just the dot product of the corresponding vectors!
Thus you can just compute those vectors first, and then compute the dot product lazily, on demand.
To compute the dot-product:
Now, a good representation of a vector with only 0s and 1s is an array of integers, which is basically a bitmap.
For 5000 entries, you will need an array of 79 64-bit integers.
So given two such arrays, you need to count the number of 1s that are common.
To count the number of bits common to two integers, first you can do the bitwise AND and then count the numbers of 1s that are set in the resulting number.
For this either you can use lookup tables or some bitcounting methods (not sure if python will support them), like here: http://graphics.stanford.edu/~seander/bithacks.html
So your algorithm will be something like this:
Initialize an array of 79 64 bit integers for each product.
For each customer, look at the products bought and set the appropriate bit for that customer in that corresponding products.
Now given a query of two products for which you need to know the number of customer who bought them together, just take the dot-product as describe above.
This should be reasonably fast.
As a further optimization, you can possibly consider grouping the customers.