views:

1965

answers:

6

I need to convert 'one' into '1' 'two' into '2' and so on. Is there a way to do this a library or a class or anything?

+2  A: 

Here's the trivial case approach:

>>> number = {'one':1,
...           'two':2,
...           'three':3,}
>>> 
>>> number['two']
2

Or are you looking for something that can handle "twelve thousand, one hundred seventy-two"?

Jeff Bauer
+2  A: 

This could be easily be hardcoded into a dictionary if there's a limited amount of numbers you'd like to parse.

For slightly more complex cases, you'll probably want to generate this dictionary automatically, based on the relatively simple numbers grammar. Something along the lines of this (of course, generalized...)

for i in range(10):
   myDict[30 + i] = "thirty-" + singleDigitsDict[i]

If you need something more extensive, then it looks like you'll need natural language processing tools. This article might be a good starting point.

Kena
+2  A: 

Some earlier posts might be helpful, even though they deal more with turning text to numbers. http://stackoverflow.com/questions/468241/python-convert-alphabetically-spelled-out-numbers-to-numerics

Also a great code snippet from Greg Hewgill at http://github.com/ghewgill/text2num/tree/master

My first post : )

Sander
Nice first post, the links are great. Just a tip - you don't need to click community wiki most of the time (it will prevent you from gaining any rep). See http://stackoverflow.com/questions/128434/what-are-community-wiki-posts-in-stackoverflow
Kiv
+8  A: 

Bored at work. The majority of this code is to set up the numwords dict, which is only done on the first call.

def text2int(textnum, numwords={}):
    if not numwords:
      units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
      ]

      tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

      scales = ["hundred", "thousand", "million", "billion", "trillion"]

      numwords["and"] = (1, 0)
      for idx, word in enumerate(units):    numwords[word] = (1, idx)
      for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
      for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)

    current = result = 0
    for word in textnum.split():
        if word not in numwords:
          raise Exception("Illegal word: " + word)

        scale, increment = numwords[word]
        current = current * scale + increment
        if scale > 100:
            result += current
            current = 0

    return result + current

print text2int("seven billion one hundred million thirty one thousand three hundred thirty seven")
#7100031337
recursive
+1  A: 

Thanks for the code snippet... saved me a lot of time!

I needed to handle a couple extra parsing cases, such as ordinal words ("first", "second"), hyphenated words ("one-hundred"), and hyphenated ordinal words like ("fifty-seventh"), so I added a couple lines:

def text2int(textnum, numwords={}):
    if not numwords:
        units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
        ]

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion"]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units):  numwords[word] = (1, idx)
        for idx, word in enumerate(tens):       numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    textnum = textnum.replace('-', ' ')

    current = result = 0
    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                raise Exception("Illegal word: " + word)

            scale, increment = numwords[word]

         current = current * scale + increment
         if scale > 100:
            result += current
            current = 0

    return result + current`
Jarret Hardie
A: 

Made change so that text2int(scale) will return correct conversion. Eg, text2int("hundred") => 100.

import re

numwords = {}

def text2int(textnum):

if not numwords:

    units = [ "zero", "one", "two", "three", "four", "five", "six",
            "seven", "eight", "nine", "ten", "eleven", "twelve",
            "thirteen", "fourteen", "fifteen", "sixteen", "seventeen",
            "eighteen", "nineteen"]

    tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", 
            "seventy", "eighty", "ninety"]

    scales = ["hundred", "thousand", "million", "billion", "trillion", 
            'quadrillion', 'quintillion', 'sexillion', 'septillion', 
            'octillion', 'nonillion', 'decillion' ]

    numwords["and"] = (1, 0)
    for idx, word in enumerate(units): numwords[word] = (1, idx)
    for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
    for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 
        'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]
current = result = 0
tokens = re.split(r"[\s-]+", textnum)
for word in tokens:
    if word in ordinal_words:
        scale, increment = (1, ordinal_words[word])
    else:
        for ending, replacement in ordinal_endings:
            if word.endswith(ending):
                word = "%s%s" % (word[:-len(ending)], replacement)

        if word not in numwords:
            raise Exception("Illegal word: " + word)

        scale, increment = numwords[word]

    if scale > 1:
        current = max(1, current)

    current = current * scale + increment
    if scale > 100:
        result += current
        current = 0

return result + current
Dawa