views:

1199

answers:

4

I've got some Python code that runs through a list of strings and converts them to integers or floating point numbers if possible. Doing this for integers is pretty easy

if element.isdigit():
    newelement=int(element)

Floating point numbers are more difficult. Right now I'm using partition('.') to split the string and checking to make sure that one or both sides are digits.

partition=element.partition('.')
if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and partition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
    newelement=float(element)

This works, but obviously the if statement for that is a bit of a bear. The other solution I considered is to just wrap the conversion in a try/catch block and see if it succeeds, as described in this question.

Anyone have any other ideas? Opinions on the relative merits of the partition and try/catch approaches?

+15  A: 

I would just use..

try:
    float(element)
except ValueError:
    print "Not a float"

..it's simple, and it works

Another option would be a regular expression:

import re
if re.match("^\d+?\.\d+?$", element) is None:
    print "Not float"
dbr
+1 for option 1: much, much faster than fooling around with a regex (if most strings turn out to be floats).
S.Lott
+1 for option 1, too.
lothar
@S.Lott: Most of the strings this is applied to will turn out to be ints or floats.
Chris Upchurch
Is there not a tryfloat(element) or otherwise equivalent function like C#'s float.TryParse(element). Typically excepting is not very performant. Not that this is something that will happen very often, but if it's in a tight loop, it could be an issue.
dustyburwell
Otherwise, +1 for option #1 over option #2 since option #2 will not catch entries that overflow the storage of int or float
dustyburwell
Your regex is not optimal. "^\d+\.\d+$" will fail a match at the same speed as above, but will succeed faster. Also, a more correct way would be: "^[+-]?\d(>?\.\d+)?$" However, that still doesn't match numbers like: +1.0e-10
John Gietzen
@ascalonx: int() won't overflow in Python. If a number is too large to get stored as a regular int it will use the long integer representation which can grow until you hit your virtual memory limit.
Chris Upchurch
Except that you forgot to name your function "will_it_float".
bvmou
@ascalonx: "Typically excepting is not very performant" False in Python. Exceptions in Python are often the cheapest way to do things -- try it and see.
S.Lott
Not only is the first one simple, but it's the Pythonic way of doing such a thing
Jeremy Cantrell
+2  A: 

This regex will check for scientific floating point numbers:

^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$

However, I believe that your best bet is to use the parser in a try.

John Gietzen
+5  A: 

If you cared about performance (and I'm not suggesting you should), the try-based approach is the clear winner (compared with your partition-based approach or the regexp approach), as long as you don't expect a lot of invalid strings, in which case it's potentially slower (presumably due to the cost of exception handling).

Again, I'm not suggesting you care about performance, just giving you the data in case you're doing this 10 billion times a second, or something. Also, the partition-based code doesn't handle at least one valid string.

$ ./floatstr.py
F..
partition sad: 3.1102449894
partition happy: 2.09208488464
..
re sad: 7.76906108856
re happy: 7.09421992302
..
try sad: 12.1525540352
try happy: 1.44165301323
.
======================================================================
FAIL: test_partition (__main__.ConvertTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./floatstr.py", line 48, in test_partition
    self.failUnless(is_float_partition("20e2"))
AssertionError

----------------------------------------------------------------------
Ran 8 tests in 33.670s

FAILED (failures=1)

Here's the code (Python 2.6, regexp taken from John Gietzen's answer):

def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$")
def is_float_re(str):
    return re.match(_float_regexp, str)


def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and pa\
rtition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True

if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):
        def test_re(self):
            self.failUnless(is_float_re("20e2"))

        def test_try(self):
            self.failUnless(is_float_try("20e2"))

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('floatstr.is_float_re("12.2x")', "import floatstr").timeit()
            print 're happy:', timeit.Timer('floatstr.is_float_re("12.2")', "import floatstr").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('floatstr.is_float_try("12.2x")', "import floatstr").timeit()
            print 'try happy:', timeit.Timer('floatstr.is_float_try("12.2")', "import floatstr").timeit()

        def test_partition_perf(self):
            print
            print 'partition sad:', timeit.Timer('floatstr.is_float_partition("12.2x")', "import floatstr").timeit()
            print 'partition happy:', timeit.Timer('floatstr.is_float_partition("12.2")', "import floatstr").timeit()

        def test_partition(self):
            self.failUnless(is_float_partition("20e2"))

        def test_partition2(self):
            self.failUnless(is_float_partition(".2"))

        def test_partition3(self):
            self.failIf(is_float_partition("1234x.2"))

    unittest.main()
Jacob Gabrielson
+1: Perfect. As we all know, regex is not REALLY a parsing engine, but a searching engine.
John Gietzen
+1  A: 

Strictly, these regexp-style solutions should be checking the locale. Not all locales use dots as the separator.