views:

227

answers:

4

Are there any Python libraries that help parse and validate numeric strings beyond what is supported by the built-in float() function? For example, in addition to simple numbers (1234.56) and scientific notation (3.2e15), I would like to be able to parse formats like:

  • Numbers with commas: 2,147,483,647
  • Named large numbers: 5.5 billion
  • Fractions: 1/4

I did a bit of searching and could not find anything, though I would be surprised if such a library did not already exist.

A: 

I haven't heard of one. Do you know of any such library for any other languages? That way you could leverage their documentation and tests.

If you can't find one, write a bunch of testcases, then we can help you fill out the parsing code.

Google must have one, try searching for 5.5billion * 10, but I don't think they have opensourced anything like that. Depending on how you need to use it, you might be able to use Google to do some of the work ;)

gnibbler
A: 

It should be pretty straightforward to build one in pyparsing - in fact, one of the tutorial pyparsing projects does some of this (wordsToNum.py on this page) does some of it already. You're talking about things that don't really have standard representations (standard in the sense of ISO 8602, not standard in the sense of "what everybody knows"), so it could easily be that nobody's done just what you're looking for.

Robert Rossney
+4  A: 

If you want to convert "localized" numbers such as the American "2,147,483,647" form, you can use the atof() function from the locale module. Example:

import locale
locale.setlocale(locale.LC_NUMERIC, 'en_US')
print locale.atof('1,234,456.23')  # Prints 1234456.23

As for fractions, Python now handles them directly (since version 2.6); they can even be built from a string:

from fractions import Fraction
x = Fraction('1/4')
print float(x)  # 0.25

Thus, you can parse a number written in any of the first 3 ways you mention, only with the help of the above two standard modules:

try:
    num = float(num_str)
except ValueError:
    try:
        num = locale.atof(num_str)
    except ValueError:
        try:
            num = float(Fraction(num_str))
        except ValueError:
            raise Exception("Cannot parse '%s'" % num_str)  # Or handle '42 billion' here
# 'num' has the numerical value of 'num_str', here.
EOL
A: 

babel has support for the first case (i18n numbers with commas). Docs: http://babel.edgewall.org/wiki/ApiDocs/babel.numbers.

Supporting simple named numbers should not be too hard to code up yourself, same with fractions.

codeape