views:

207

answers:

3

I'd like convert strings to floats using Python 2.6 and later, but without silently converting things like 'NaN' and 'Inf' to float objects. I do want them to be silently ignored, as with any text that isn't valid as a float representation.

Before 2.6, float("NaN") would raise a ValueError on Windows. Now it returns a float for which math.isnan() returns True, which is not useful behaviour for my application. (As was pointed out, this has always been a platform-dependent behaviour, but consider it an undesirable behaviour for my purposes, wherever it happens.)

Here's what I've got at the moment:

import math
def get_floats(source):
    for text in source.split():
        try:
            val = float(text)
            if math.isnan(val) or math.isinf(val):
                raise ValueError
            yield val
        except ValueError:
            pass

This is a generator, which I can supply with strings containing whitespace-separated sequences representing real numbers. I'd like it to yield only those fields which are purely numeric representations of floats, as in "1.23" or "-34e6", but not for example "NaN" or "-Inf". Things that aren't floats at all, e.g. "hello", should be ignored as well.

Test case:

assert list(get_floats('1.23 foo -34e6 NaN -Inf')) == [1.23, -34000000.0]

Please suggest alternatives you consider more elegant, even if they involve "look before you leap" (which is normally considered a lesser approach in Python).

Edited to clarify that non-float text such as "hello" should just be ignored quietly as well. The purpose is to pull out only those things that are real numbers and ignore everything else.

A: 

This is a very minor suggestion, but continue is a little faster than raising an exception:

def get_floats(source):
    for text in source.split():
        try:
            val = float(text)
            if math.isnan(val) or math.isinf(val): continue
            yield val
        except ValueError:
            pass

Using raise ValueError:

% python -mtimeit -s'import test' "list(test.get_floats('1.23 -34e6 NaN -Inf Hello'))"
10000 loops, best of 3: 22.3 usec per loop

Using continue:

% python -mtimeit -s'import test' "list(test.get_floats_continue('1.23 -34e6 NaN -Inf Hello'))"
100000 loops, best of 3: 17.2 usec per loop
unutbu
Good point. In my original code it was not in a loop, so the continue wouldn't have worked, but this is a good suggestion.
Peter Hansen
That's not fair- you ran 10x as many loops in the second test as in the first.
DeadMG
@DeadMG: You are correct that the first timeit run executed the command 10000 times (then repeated the test 3 times) and the second executed the command 100000 times (then repeated that 3 times). But the usec per loop refers to the time *divided by the number of loops*, so the numbers are comparable.
unutbu
@DeadMG: since I did not specify how many loops, timeit uses the following rule: "If -n is not given, a suitable number of loops is calculated by trying successive powers of 10 until the total time is at least 0.2 seconds." (17.2 usec *10000) < 0.2 sec. So timeit stepped up the second test to 100000 loops.
unutbu
@unutbu: It is not a fair test to run the loop 10,000 times vs 100,000 times, regardless of the total.
DeadMG
@DeadMG: Assuming the time to complete one loop is an independent normal random variable, the average of N loops will have standard error proportional to 1/sqrt(N). The purpose of running many loops is to reduce the standard error. When the difference between 22.3 and 17.2 is much greater than the sum of the squares of the standard errors, choosing N or 10*N isn't going to change the result. Am I missing something? If so, please explain.
unutbu
+2  A: 

I'd write it like this. I think it combines conciseness with readability.

def is_finite(x):
    return not math.isnan(x) and not math.isinf(x)

def get_floats(source):
    for x in source.split():
        try:
            yield float(x)
        except ValueError:
            pass

def get_finite_floats(source):
    return (x for x in get_floats(source) if is_finite(x))
Paul Hankin
I didn't include an example in my test case, but I do want this to quietly ignore things that are not floats by any definition, such as simple text strings, e.g. "foo". I'll clarify in the question.
Peter Hansen
I changed the code to meet the changing requirements.
Paul Hankin
A: 

I voted up Paul Hankin's answer for readability, though if I don't want to split the code up as much here's a variation of my original that's less clunky.

def get_only_numbers(source):
    '''yield all space-separated real numbers in source string'''
    for text in source.split():
        try:
            val = float(text)
        except ValueError:
            pass  # ignore non-numbers
        else:
            # "NaN", "Inf" get converted: explicit test to ignore them
            if not math.isnan(val) and not math.isinf(val):
                yield val

Still nothing far off what I originally had.

Peter Hansen