views:

156

answers:

5

I'm a fairly green programmer, and I'm learning Python right now. I'm up to chapter 17 in "Learn to Think Like a Computer Scientist" (Classes and Methods), and I just wrote my first doctest that failed in a way I truly do not fully understand:

class Point(object):
    '''
    represents a point object.
    attributes: x, y
    '''

    def ___init___(self, x = 0, y = 0):
        '''
        >>> point = Point()
        >>> point.y
        0
        >>> point = Point(4.7, 8.2)
        >>> point.x
        4.7
        '''

        self.x = x
        self.y = y

The second doctest for __init__ fails, and returns 4.7000000000000002 instead of 4.7. However, if I rewrite the doctest with a "print" statement as so:

>>> point = Point(4.7, 8.2)
>>> print point.x
4.7

It runs correctly.

So I read up on how Python stores floats, and I now understand that, due to binary representation of decimal numbers, the reason for the discrepancy is that Python stores 4.7 as a string of 1s and 0s that almost but don't quite equal 4.7.

But what I don't understand is why a call to "point.x" returns 4.7000000000000002 and a call to "print point.x" returns 4.7. Under what other circumstances will Python choose to round like it does with "print"? How does this rounding work? Can these trailing significant figures lead to errors in programming (aside from, obviously, failed doctests)? Can a failure to pay attention to rounding create dangerous ambiguity?

Since this has to do with binary representation of decimal numbers, I'm sure that this is in fact a general CS issue and not one specific to Python, but what I really need to know right now is what I can do, specifically as a Python programmer, to avoid any related issues and/or bug infestations.

Also, for bonus points, is there some other way that Python can store floating point numbers aside from the default activated by a line like "a = 4.7"? I know there's the Decimal package, but I'm not totally sure how it works. Honestly, all of this dynamic typing stuff confuses me sometimes.

Edit: I should specify that I'm using Python 2.6 (at some point I want to use NumPy and Biopython)

+1  A: 

You get a different behavior because print truncates numbers:

In [1]: 1.23456789012334
Out[1]: 1.23456789012334 
In [2]: print 1.23456789012334
1.23456789012

Note, at the precision used in Python's floats:

In [3]: 4.7 == 4.7000000000000002
Out[3]: True

This is because floats have a limited (relative) precision because they use a finite number of (binary) digits to represent real numbers. Thus, as above, different decimal representations of a given number can actually be equal for Python, after being approximated by the closest float. This is a general property of floating point numbers.

EOL
+2  A: 

When working with floating point numbers, the common approach goes like this:

a == b if abs(a-b) <= eps, where eps is the required precision.

In programming contests, eps is given along with the problem to solve. My advice is to establish an accuracy that you need for your stuff, and use it

Gabi Purcaru
+3  A: 

This has to do with how computers store floating point numbers. A detailed description of this is here. However, for your case, the quick solution is to check not the printed representation of point.x but if point.x is equal to 4.7. So...

>>> point = Point(4.7, 8.2)
>>> point.x == 4.7
True

Or better:

>>> point = Point(4.7, 8.2)
>>> eps = 2**-53 #get epsilon for standard double precision number
>>> -eps <= point.x - 4.7 <= eps
True

Where eps is the maximum value for rounding errors in floating-point arithmetic. For details on epsilon, see here.

EDIT: -eps <= point.x - 4.7 <= eps is equivalent to abs(point.x - 4.7) <= eps. I only add this because not everyone is familiar with Python's chaining of comparison operators.

EDIT 2: Since you mentioned numpy, numpy has a method to get the eps without calculating it yourself. Use eps = numpy.finfo(float).eps instead of 2**-53 if you're using numpy. Note that the numpy epsilon is for some reason bigger than it should be and is equal to 2**-52 rather than 2**-53. I have no idea why this is.

Chinmay Kanchi
Machine epsilon is a bound for **relative** error. You can't use it as you did, as the absolute error will be larger for values farther away from zero. In this specific case, `point.x - 4.7` will always give exactly 0 anyway.
interjay
+1  A: 

This comprehensive guide explains everything.

Here are Python-specific explanations.

niscy
+4  A: 
>>> point.x

calls repr function which is for string representation holding more technical information than strfunction, which is called when

>>> print point.x

occurs

Odomontois
Thank you for answering a question I should have asked
tel