views:

164

answers:

2

I'm using the PRTools MATLAB library to train some classifiers, generating test data and testing the classifiers.

I have the following details:

  • N: Total # of test examples
  • k: # of mis-classification for each classifier and class

I want to do:

Calculate and plot Bayesian posterior distributions of the unknown probabilities of mis-classification (denoted q), that is, as probability density functions over q itself (so, P(q) will be plotted over q, from 0 to 1).

I have that (math formulae, not matlab code!):

Posterior = Likelihood * Prior  / Normalization constant = 
P(q|k,N)  = P(k|q,N)   * P(q|N) / P(k|N)

The prior is set to 1, so I only need to calculate the likelihood and normalization constant.

I know that the likelihood can be expressed as (where B(N,k) is the binomial coefficient):

P(k|q,N) = B(N,k) * q^k * (1-q)^(N-k)

... so the Normalization constant is simply an integral of the posterior above, from 0 to 1:

P(k|N) = B(N,k) * integralFromZeroToOne( q^k * (1-q)^(N-k) )

(The Binomial coefficient ( B(N,k) ) can be omitted though as it appears in both the likelihood and normalization constant)

Now, I've heard that the integral for the normalization constant should be able to be calculated as a series ... something like:

k!(N-k)! / (N+1)!

Is that correct? (I have some lecture notes with this series, but can't figure out if it is for the normalization constant integral, or for the overall distribution of mis-classification (q))

Also, hints are welcome as how to practically calculate this? (factorials are easily creating truncation errors right?) ... AND, how to practically calculate the final plot (the posterior distribution over q, from 0 to 1).

+1  A: 

I really haven't done much with Bayesian posterior distributions ( and not for a while), but I'll try to help with what you've given. First,

k!(N-k)! / (N+1)! = 1 / (B(N,k) * (N + 1))

and you can calculate the binomial coefficients in Matlab with nchoosek() though it does say in the docs that there can be accuracy problems for large coefficients. How big are N and k?

Second, according to Mathematica,

integralFromZeroToOne( q^k * (1-q)^(N-k) ) = pi * csc((k-N)*pi) * Gamma(1+k)/(Gamma(k-N) * Gamma(2+N))

where csc() is the cosecant function and Gamma() is the gamma function. However, Gamma(x) = (x-1)! which we'll use in a moment. The problem is that we have a function Gamma(k-N) on the bottom and k-N will be negative. However, the reflection formula will help us with that so that we end up with:

= (N-k)! * k! / (N+1)!

Apparently, your notes were correct.

Justin Peel
Many thanks! Understanding the origin of the expression ((N-k)! * k! / (N+1)!) was my main concern, so this was much helpful!
Samuel Lampa
No problem. It was good to try and remember some of the math I learned in my Mathematics for Physicists class some time back.
Justin Peel
A: 

Let q be the probability of mis-classification. Then the probability that you would observe k mis-classifications in N runs is given by:

P(k|N,q) = B(N,k) q^k (1-q)^(N-k)

You need to then assume a suitable prior for q which is bounded between 0 and 1. A conjugate prior for the above is the beta distribution. If q ~ Beta(a,b) then the posterior is also a Beta distribution. For your info the posterior is:

f(q|-) ~ Beta(a+k,b+N-k)

Hope that helps.

Anon
I already have that the prior is = 1 (pessimistic). My main concern in fact was the expression: k!(N-k)! / (N+1), but many thanks for answering!
Samuel Lampa
Sure. Do note that if you do assume that the prior is Beta than you will not run into issues with computing the factorials. You can make the Beta equivalent to the uniform i.e prior =1 by assuming a=1 and b=1 for the beta distribution. Also, your terminology in your question is a bit non-standard. For example, P(k|q,n) is the likelihood/probability and not the posterior.
Anon
In fact, assuming a prior =1 is equivalent to assuming that q ~ Beta(1,1).
Anon
Ah, ok. Thanks for clarifying. Will try to calculate this way too ...
Samuel Lampa
Regarding terminology: Indeed, I had confused the likelihood and posterior. Have fixed the terminology now.
Samuel Lampa