views:

57

answers:

4

We've just started to kick the tires pyparsing and like it so far, but we've been unable to get it to help us parse fractional number strings to turn them into numeric data types.

For example, if a column value in a database table contained the string:

1 1/2

We'd like some way to convert it into the numeric python equivalent:

1.5

We'd like to make a parser that doesn't care whether the numbers in the fraction are integer or real. For example, we'd like:

1.0 1.0/2.0

...to still translate to:

1.5

Essentially we'd like a parser conceptually to do the following:

"1 1/2" = 1 + 0.5 = 1.5

The following example code seems to get us close...

http://pyparsing.wikispaces.com/file/view/parsePythonValue.py

...but not close enough to make headway. All our tests to make a fractional number handler only return the first part of the expression (1). Tips? Hints? Timely Wisdom? :)

+1  A: 

This recipe might be helpful:

Look around line 39:

mixed = Combine(numeral + fraction, adjacent=False, joinString=' ')
ars
Thanks for this, this is an interesting recipe that might help us for a similar parsing issue. (entity detection of product attributes "12 volt dc motor") Unfortunately, when we tried working with this code it throw an error. After we fixed the error it doesn't seem to work as expected, but we'll continue to look at it because its an example of how pyparsing is used for a similar issue we are investigating. :) Thanks!
Xavian
+2  A: 

Not precisely what you're looking for, but...

>>> import fractions
>>> txt= "1 1/2"
>>> sum( map( fractions.Fraction, txt.split() ) )
Fraction(3, 2)
>>> float(_)
1.5
S.Lott
Wow, this is really nice and elegant, I can't believe we overlooked this. :) Unfortunately, the source data we are coping with is very messy and a pain to deal with, so something so strait-forward is unlikely to work. Sometimes we see stuff like "1 1 /2" or "~1 1/2" or, maddeningly, "1 1/8 ~ 2 7/8". We want a basic parser for sane formats to start with and then refactor cover the most common cases in the legacy data we're coping with.
Xavian
A: 

This is kind of double with S. Lott, but here is it anyway:

from fractions import Fraction
print sum(Fraction(part) for part in '1 1/2'.split())

Dealing with float 'integers' would be quite convoluted, though:

from fractions import Fraction
clean = '1.0 1.0/2.0'.replace('.0 ',' ').replace('.0/', '/').rstrip('0.').split()
print(clean)
print(sum(Fraction(part) for part in clean))

And other poster's examples, plus one with / with whitespace:

from fractions import Fraction

tests = """\
1
1.0
1/2
1.0/2.0
1 1/2
1.0 1/2
1.0 1.0/2.0
1.0 1.0 / 2.0
""".splitlines()

for t in tests:
    clean = t.replace('.0 ',' ').replace('.0/', '/').rstrip('0.').split()
    value = sum(Fraction(part) for part in clean)
    print('%s -> %s, %s = %f' % (t, clean, value, float(value)))
Tony Veijalainen
Super succinct. :) If only our data were cleaner, we'd be able to use this approach. :)
Xavian
+2  A: 

Since you cite some tests, it sounds like you've at least taken a stab at the problem. I assume you've already defined a single number, which can be integer or real - doesn't matter, you are converting everything to float anyway - and a fraction of two numbers, probably something like this:

from pyparsing import Regex, Optional

number = Regex(r"\d+(\.\d*)?").setParseAction(lambda t: float(t[0]))

fraction = number("numerator") + "/" + number("denominator")
fraction.setParseAction(lambda t: t.numerator / t.denominator)

(Note the use of parse actions, which do the floating point conversion and fractional division right at parse time. I prefer to do this while parsing, when I know something is a number or a fraction or whatever, instead of coming back later and sifting through a bunch of fragmented strings, trying to recreate the recognition logic that the parser has already done.)

Here are the test cases I composed for your problem, made up of a whole number, a fraction, and a whole number and fraction, using both integers and reals:

tests = """\
1
1.0
1/2
1.0/2.0
1 1/2
1.0 1/2
1.0 1.0/2.0""".splitlines()

for t in tests:
    print t, fractExpr.parseString(t)

The last step is how to define a fractional expression that can be a single number, a fraction, or a single number and a fraction.

Since pyparsing is left-to-right, it does not do the same kind of backtracking like regexen do. So this expression wont work so well:

fractExpr = Optional(number) + Optional(fraction)

To sum together the numeric values that might come from the number and fraction parts, add this parse action:

fractExpr.setParseAction(lambda t: sum(t))

Our tests print out:

1 [1.0]
1.0 [1.0]
1/2 [1.0]
1.0/2.0 [1.0]
1 1/2 [1.5]
1.0 1/2 [1.5]
1.0 1.0/2.0 [1.5]

For the test case 1/2, containing just a fraction by itself, the leading numerator matches the Optional(number) term, but that leaves us just with "/2", which doesn't match the Optional(fraction) - fortunately, since the second term is optional, this "passes", but it's not really doing what we want.

We need to make fractExpr a little smarter, and have it look first for a lone fraction, since there is this potential confusion between a lone number and the leading numerator of a fraction. The easiest way to do this is to make fractExpr read:

fractExpr = fraction | number + Optional(fraction)

Now with this change, our tests come out better:

1 [1.0]
1.0 [1.0]
1/2 [0.5]
1.0/2.0 [0.5]
1 1/2 [1.5]
1.0 1/2 [1.5]
1.0 1.0/2.0 [1.5]

There are a couple of classic pitfalls with pyparsing, and this is one of them. Just remember that pyparsing only does the lookahead that you tell it to, otherwise it is just straight left-to-right parsing.

Paul McGuire
Awesome, great answer, thank you for taking the time to spell it out! We got as far as the test case: 1/2 [1.0] and was baffled why we kept getting a 1 instead of a 0.5. It looks like you outlined our stumbling blocks. The data itself is very messy, but it looks like this at least can give us a solid foundation to build upon as we enumerate the other most common customer product attributes value expressions. :)
Xavian