ansaurus

Question

convert string to float without silent NaN/Inf conversion

Answer 1

A:

This is a very minor suggestion, but continue is a little faster than raising an exception:

def get_floats(source):
    for text in source.split():
        try:
            val = float(text)
            if math.isnan(val) or math.isinf(val): continue
            yield val
        except ValueError:
            pass

Using raise ValueError:

% python -mtimeit -s'import test' "list(test.get_floats('1.23 -34e6 NaN -Inf Hello'))"
10000 loops, best of 3: 22.3 usec per loop

Using continue:

% python -mtimeit -s'import test' "list(test.get_floats_continue('1.23 -34e6 NaN -Inf Hello'))"
100000 loops, best of 3: 17.2 usec per loop

unutbu 2010-06-05 16:21:58

Good point. In my original code it was not in a loop, so the continue wouldn't have worked, but this is a good suggestion.

Peter Hansen 2010-06-06 12:36:20

That's not fair- you ran 10x as many loops in the second test as in the first.

DeadMG 2010-06-06 12:43:30

@DeadMG: You are correct that the first timeit run executed the command 10000 times (then repeated the test 3 times) and the second executed the command 100000 times (then repeated that 3 times). But the usec per loop refers to the time *divided by the number of loops*, so the numbers are comparable.

unutbu 2010-06-06 13:18:29

@DeadMG: since I did not specify how many loops, timeit uses the following rule: "If -n is not given, a suitable number of loops is calculated by trying successive powers of 10 until the total time is at least 0.2 seconds." (17.2 usec *10000) < 0.2 sec. So timeit stepped up the second test to 100000 loops.

unutbu 2010-06-06 13:21:30

@unutbu: It is not a fair test to run the loop 10,000 times vs 100,000 times, regardless of the total.

DeadMG 2010-06-06 14:36:28

@DeadMG: Assuming the time to complete one loop is an independent normal random variable, the average of N loops will have standard error proportional to 1/sqrt(N). The purpose of running many loops is to reduce the standard error. When the difference between 22.3 and 17.2 is much greater than the sum of the squares of the standard errors, choosing N or 10*N isn't going to change the result. Am I missing something? If so, please explain.

unutbu 2010-06-06 15:38:15

Answer 2

+2 A:

I'd write it like this. I think it combines conciseness with readability.

def is_finite(x):
    return not math.isnan(x) and not math.isinf(x)

def get_floats(source):
    for x in source.split():
        try:
            yield float(x)
        except ValueError:
            pass

def get_finite_floats(source):
    return (x for x in get_floats(source) if is_finite(x))

Paul Hankin 2010-06-05 16:44:09

I didn't include an example in my test case, but I do want this to quietly ignore things that are not floats by any definition, such as simple text strings, e.g. "foo". I'll clarify in the question.

Peter Hansen 2010-06-06 12:35:45

I changed the code to meet the changing requirements.

Paul Hankin 2010-06-06 13:16:02

Answer 3

A:

I voted up Paul Hankin's answer for readability, though if I don't want to split the code up as much here's a variation of my original that's less clunky.

def get_only_numbers(source):
    '''yield all space-separated real numbers in source string'''
    for text in source.split():
        try:
            val = float(text)
        except ValueError:
            pass  # ignore non-numbers
        else:
            # "NaN", "Inf" get converted: explicit test to ignore them
            if not math.isnan(val) and not math.isinf(val):
                yield val

Still nothing far off what I originally had.

Peter Hansen 2010-06-09 01:48:53

ansaurus

tags:

views:

answers:

convert string to float without silent NaN/Inf conversion

related questions