views:

58

answers:

1

So I just fixed an interesting bug in the following code, but I'm not sure the approach I took it the best:

p = 1
probabilities = [ ... ] # a (possibly) long list of numbers between 0 and 1
for wp in probabilities:

  if (wp > 0):
    p *= wp

# Take the natural log, this crashes when 'probabilites' is long enough that p ends up
# being zero
try:
    result = math.log(p)

Because the result doesn't need to be exact, I solved this by simply keeping the smallest non-zero value, and using that if p ever becomes 0.

p = 1
probabilities = [ ... ] # a long list of numbers between 0 and 1
for wp in probabilities:

  if (wp > 0):
    old_p = p
    p *= wp
    if p == 0:
      # we've gotten so small, its just 0, so go back to the smallest
      # non-zero we had
      p = old_p
      break

# Take the natural log, this crashes when 'probabilites' is long enough that p ends up
# being zero
try:
    result = math.log(p)

This works, but it seems a bit kludgy to me. I don't do a ton of this kind of numerical programming, and I'm not sure if this is the kind of fix people use, or if there is something better I can go for.

+7  A: 

Since, math.log(a * b) is equal to math.log(a) + math.log(b), why not take a sum of the logs of all members of the probabilities array?

This will avoid the problem of p getting so small it under-flows.

Edit: this is the numpy version, which is cleaner and a lot faster for large data sets:

import numpy
prob = numpy.array([0.1, 0.213, 0.001, 0.98 ... ])
result = sum(numpy.log(prob))
James Roth
genius! I knew I had to to remove my programmer hat and put on my mathematician hat for that one :-)
Tristan Havelick