views:

264

answers:

4

So I have a bunch of hyperlinks on a web page. From past observation I know the probabilities that a user will click on each of these hyperlinks. I can therefore calculate the mean and standard deviation of these probabilities.

I now add a new hyperlink to this page. After a short amount of testing I find that of the 20 users that see this hyperlink, 5 click on it.

Taking into account the known mean and standard deviation of the click-through probabilities on other hyperlinks (this forms a "prior expectation"), how can I efficiently estimate the probability of a user clicking on the new hyperlink?

A naive solution would be to ignore the other probabilities, in which case my estimate is just 5/20 or 0.25 - however this means we are throwing away relevant information, namely our prior expectation of what the click-through probability is.

So I'm looking for a function that looks something like this:

double estimate(double priorMean, 
                double priorStandardDeviation, 
                int clicks, int views);

I'd ask that, since I'm more familiar with code than mathematical notation, that any answers use code or pseudocode in preference to math.

+2  A: 

P/N is actually correct from a frequentist perspective.

You could also use a bayesian approach to incorporate prior knowledge, but since you don't seem to have that knowledge, I guess P/N is the way to go.

If you want, you can also use Laplace's rule which iirc comes down to a uniform prior. Just give each link on the page a start of 1 instead of 0. (So if you count the number a link was clicked, give each a +1 bonus and resemble that in your N.)

[UPDATE] Here is a bayesian approach:

Let p(W) be the probability that a person is in a specific group W. Let p(L) be the probability, that a specific link is clicked. then the probability you are looking for is p(L|W). By Bayes' theorem, you can calculate this by

p(L|W) = p(W|L) * p(L) / p(W)

You can estimate p(L) by the amount L was clicked, p(W) by the size of that group with respect to the rest of the users and p(W|L) = p(W and L) / p(L) by the number of persons of the specific group W that clicked L divided by the probability that L is clicked.

bayer
Doesn't X constitute prior knowledge?
sanity
Not so much, since X and W are not independent.I updated the answer with a bayesian approach.
bayer
I've reframed this question significantly since this answer so I'm afraid it won't make much sense any more.
sanity
A: 

Bayes' Theorem Proof:

P(A,B) = P( A | B ) * P( B )    (1)

since,

P(A,B) = P(B,A)                 (2)

And substituting (2) with (1),

P(A | B) * P( B ) = P (B | A) * P(A)

thus (Bayes' Theorem),

           P( B | A ) * P(A)
P(A | B) = -----------------
                 P(B)

P(A)   -- prior/marginal probability of A, may or may not take into account B
P(A|B) -- conditional/posterior probability of A, given B.
P(B|A) -- conditional probability of B given A.
P(B)   -- prior/marginal probability of B

Consequences,

P( A | B ) = P( A ), then a and b are independent
P( B | A ) = P( B ), and then

and the definition of independence is,

P(A,B) = P(A | B) * P( B ) = P( A )* P( B )

It should be noted, that it is easy to manipulate the probability to your liking by changing the priors and the way the problem is thought of, take a look at this discussion of the Anthropic Principle and Bayes' Theorem.

nlucaroni
What does this general post on Bayes' theorem have to do with the question?
Martin B
You're right, he doesn't need a Bayesian approach at all. I feel you can understand something better by understanding the proof, Bayes' Theorem in itself is a little weird, so I included the proof.
nlucaroni
A: 

You need to know how strongly X is correlated with W.

Most likely you also want to have a more complex mathematical model if you want to develop a big website. If you run a website like digg you have a lot of prior knowledge that you have to factor into your calcualtion. That leads to multivariate statistics.

Christian
+1  A: 

I made this a new answer since it's fundamentally different.

This is based on Chris Bishop, Machine Learning and Pattern Recognition, Chapter 2 "Probability Distributions" p71++ and http://en.wikipedia.org/wiki/Beta_distribution.

First we fit a beta distribution to the given mean and variance in order to build a distribution over the parametes. Then we return the mode of the distribution which is the expected parameter for a bernoulli variable.

def estimate(prior_mean, prior_variance, clicks, views):
  c = ((prior_mean * (1 - prior_mean)) / prior_variance - 1)
  a = prior_mean * c
  b = (1 - prior_mean) * c
  return ((a + clicks) - 1) / (a + b + views - 2)

However, I am quite positive that the prior mean/variance will not work for you since you throw away information about how many samples you have and how good your prior thus is.

Instead: Given a set of (webpage, link_clicked) pairs, you can calculate the number of pages a specific link was clicked on. Let that be m. Let the amount of times that link was not clicked be l.

Now let a be the number of clicks to your new link be a and the number of visits to the site be b. Then your probability of your new link is

def estimate(m, l, a, b):
  (m + a) / (m + l + a + b)

Which looks pretty trivial but actually has a valid probabilistic foundation. From the implementation perspective, you can keep m and l globally.

bayer
If you want to consider the number of samples in the prior that is fine. Regarding the second estimate() function, this doesn't seem to work. If m and l are very large, as is likely in this scenario since we'll have a lot of data for the other links, then a and b have virtually no effect.
sanity
Correct, since more prior information correspondends to a stronger prior.Maybe what you want is to fix (a + b) / (clicks + views) to a "reasonable"/"arbitrary" mixture in the first function.
bayer
I don't really understand what you mean :-/I don't see (a+b)/(clicks+views) anywhere in either function.
sanity
a and b are "prior" information and m and l are your "current" information. What you want now is to "weight" them in a way that makes sense for you. You can do this by multiplying the prior info a and b with p and the current info m and l with (1 - p). You can also weight them with other arbitrary constants. This might do the job for you, but there is no actual probabilistic reasoning behind this...
bayer
Ok, having some weird output from your first estimate() function, it looks like I get a negative result if clicks=0. For example: estimate(0.10648293141246475, 0.011756461755177718, 0, 43) = -0.0050885467488 - obviously it shouldn't be negative! I cut and pasted your function into python. Any ideas?
sanity
Hm. Yes, it is because the -1 and -2 corrections. I have not derived this myself, so I cannot tell what the exact reasoning behind those is. However, you can just remove them, I guess - it won't make a limit in the difference. I get a result of 0.015...% then, which sounds kinda reasonable.
bayer
A difference in the limit, that's what I meant.
bayer
ok, well - even though I'm not 100% sure of your solution, you definitely deserve the bounty given all your efforts. Can you give me any particular pointers about where I can learn more about this type of thing?
sanity
It's all probability theory. A quite advanced book is "Pattern Recognition and Machine Learning" by Chris Bishop. That features a lot of statistics and linear algebra, so if your math is not too solid this will give you a pretty tough time. Also it covers a lot of more advanced topics.I guess that maybe "Radically elementary probability theory" is what you want. It's a classic. http://freebooksource.info/?p=57337ps: thanks. :)
bayer