views:

79

answers:

2

My actual problem is a bit more general that this, but here is a specific example. In basketball, you calculate free throw percentage as:

Free-Throw Percentage (FT%) = Free-Throws Made (FTM) / Free-Throws Attempted (FTA)

I have two teams, and for each team I have the mean and variance of the team's FTM and FTA, so I can model each as a random normal variable (obviously FTM and FTA will be correlated). I can then easily compute the probability that one team will make more free throws than the other, for example.

My question is... how can I find the probability that one team will shoot a higher free-throw percentage than the other? Why is this so hard to compute? Any ideas?

Thanks in advance! :-)

+1  A: 

It turns out that the ratio of normally distributed variable (such as FTA and FTM in your model), is distributed in a way that is rather complicated to describe! The simplest (or perhaps least intractable!) case is when both means are 0, in which case the ratio follows a Cauchy distribution. This distribution is tough to work with, because the integrals representing the mean and variance are not well defined. But FTA and FTM have nonzero means, so even this is an oversimplification. So I don't think you're going to find any simple expression for the probability you're trying to calculate.

Another way to look at it might be: who cares if the math is intractable...just simulate it! Perform N trials, generating properly distributed values for each team's FTM and FTA, then keep track of how many times Team 1 has a better FT% than Team 2. N might not need to be too large, depending on how accurate your estimate needs to be...it can be shown that the error in the estimated proportion varies as 1/sqrt(N).

I'd also suggest modeling FTM with something other than a normal distribution. A binomial distribution, with parameters n=mean(FTA) and p=mean(FTM)/mean(FTA), seems like a better fit. With two normal distributions, there's a nonzero probability that FTM > FTA, which doesn't make sense.

Jim Lewis
Don't the normal variables need to have a mean of 0 in order for their ratio to follow a Cauchy distribution? The simulation idea is interesting... although performance will be a consideration here. Also, I think you're spot on about the binomial distribution being a better fit for FTM. Thanks for the input!
Kenny
@Kenny: You're right, I missed the mean=0 condition! In that case the math gets even worse. I'll try to reword my answer a bit.
Jim Lewis
A: 

use the Geary–Hinkley transformation

piccolbo
Awesome formula! Is there any way to apply this to correlated variables?
Kenny
I suspect the answer is no, if any form of dependency is allowed. In this case I think you can try and model the situation more closely, but the result is difficult to analyze. You have a binomial variable of parameters n,p (FTM) where n = FTA is a random variable following as you say a normal. I think you are unlikely to find a closed form for this distribution. I found a pretty terse paper about this here http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843591/, but I don't have time to try and understand it.
piccolbo
In this case I would simulate the above model and then use a bootstrap estimate of the quantity of interest (difference of two teams shooting averages if I undersand right). In R the module boot will be very helpful for this, as well rnorm and rbinom. The only problem is that I disagree that p in the binomial is mean(FTM)/mean(FTA), I think p is itself a random variable and even dependent on FTA. So the simulation is not obvious and maybe you don't have enough information to perform it.
piccolbo