views:

99

answers:

1

Ok, so here is a problem analogous to my problem (I'll elaborate on the real problem below, but I think this analogy will be easier to understand).

I have a strange two-sided coin that only comes up heads (randomly) 1 in every 1,001 tosses (the remainder being tails). In other words, for every 1,000 tails I see, there will be 1 heads.

I have a peculiar disease where I only notice 1 in every 1,000 tails I see, but I notice every heads, and so it appears to me that the rate of noticing a heads or tails is 0.5. Of course, I'm aware of this disease and its effect so I can compensate for it.

Someone now gives me a new coin, and I noticed that the rate of noticing heads is now 0.6. Given that my disease hasn't changed (I still only notice 1 in every 1,000 tails), how do I calculate the actual ratio of heads to tails that this new coin produces?


(You shouldn't need to understand this to help with this problem, so if it confuses you - ignore it ;)

Ok, so what is the real problem? Well, I have a bunch of data consisting of input, and outputs which are 1s and 0s. I want to teach a supervised machine learning algorithm to predict the expected output (a float between 0 and 1) given an input. The problem is that the 1s are very rare, and this screws up the internal math because it becomes very susceptible to rounding errors - even with high-precision floating point math.

So, I normalize the data by randomly omitting most of the 0 training samples so that it appears that there is a roughly equal ratio of 1s and 0s. Of course, this means that now the machine learning algorithm's output is no-longer predicting a probability, ie. instead of predicting 0.001 as it should, it would now predict 0.5.

I need a way to convert the output of the machine learning algorithm back to a probability within the original training set.

+2  A: 

You are calculating the following

calculatedRatio = heads / (heads + tails / 1000)

and you need

realRatio = heads / (heads + tails)

Solving both equations for tails yields the following equations.

tails = 1000 / calculatedRatio - 1000
tails = 1 / realRatio - 1

Combining both yields the following.

1000 / calculateRatio - 1000 = 1 / realRatio - 1

And finally solving for realRatio.

realRatio = 1 / (1000 / calculatedRatio - 999)

Seems to be correct. calculatedRatio 0.5 yields realRatio 1/1001, 0.6 yields 3 / 2003.

Daniel Brückner