ansaurus

Question

Answer 1

+1 A:

The PEAR Numbers_Words package is probably a good start: http://pear.php.net/package-info.php?package=Numbers_Words

Jani Hartikainen 2009-07-03 03:09:53

Thanks Jani. This package looks interesting, though this does the vice versa of my aim, i.e. from numbers to words. Would be useful in future projects.

2009-07-03 05:34:02

Answer 2

+2 A:

There are lots of pages discussing the conversion from numbers to words. Not so many for the reverse direction. The best I could find was some pseudo-code on Ask Yahoo. See http://answers.yahoo.com/question/index?qid=20090216103754AAONnDz for a nice algorithm:

Well, overall you are doing two things: Finding tokens (words that translates to numbers) and applying grammar. In short, you are building a parser for a very limited language.

The tokens you would need are:

POWER: thousand, million, billion
HUNDRED: hundred
TEN: twenty, thirty... ninety
UNIT: one, two, three, ... nine,
SPECIAL: ten, eleven, twelve, ... nineteen

(drop any "and"s as they are meaningless. Break hyphens into two tokens. That is sixty-five should be processed as "sixty" "five")

Once you've tokenized your string, move from RIGHT TO LEFT.

Grab all the tokens from the RIGHT until you hit a POWER or the whole string.

Parse the tokens after the stop point for these patterns:

SPECIAL
TEN
UNIT
TEN UNIT
UNIT HUNDRED
UNIT HUNDRED SPECIAL
UNIT HUNDRED TEN
UNIT HUNDRED UNIT
UNIT HUNDRED TEN UNIT

(This assumes that "seventeen hundred" is not allowed in this grammar)

This gives you the last three digits of your number.

If you stopped at the whole string you are done.

If you stopped at a power, start again at step 1 until you reach a higher POWER or the whole string.

John Kugelman 2009-07-03 03:26:30

Thank you John! This algo is exactly what I was looking for. I was trying to parse it from left to right, but this looks better. Appreciate your help!

2009-07-03 05:31:55

+1 John - Your answers are always great.

alex 2009-07-03 05:59:13

Why are we processing tokens from the right ?

joebert 2009-07-03 06:09:03

@joebert 'cause it's easier to code :)

Csaba Kétszeri 2009-07-03 11:09:00

Answer 3

+1 A:

I haven't tested this too extensively, I more or less just worked on it until I saw what I expected in the output, but it seems to work, and parses from left-to-right.

<?php

$str = 'twelve billion people know iPhone has two hundred and thirty thousand, seven hundred and eighty-three apps as well as over one million units sold';

function strlen_sort($a, $b)
{
    if(strlen($a) > strlen($b))
    {
     return -1;
    }
    else if(strlen($a) < strlen($b))
    {
     return 1;
    }
    return 0;
}

$keys = array(
    'one' => '1', 'two' => '2', 'three' => '3', 'four' => '4', 'five' => '5', 'six' => '6', 'seven' => '7', 'eight' => '8', 'nine' => '9',
    'ten' => '10', 'eleven' => '11', 'twelve' => '12', 'thirteen' => '13', 'fourteen' => '14', 'fifteen' => '15', 'sixteen' => '16', 'seventeen' => '17', 'eighteen' => '18', 'nineteen' => '19',
    'twenty' => '20', 'thirty' => '30', 'forty' => '40', 'fifty' => '50', 'sixty' => '60', 'seventy' => '70', 'eighty' => '80', 'ninety' => '90',
    'hundred' => '100', 'thousand' => '1000', 'million' => '1000000', 'billion' => '1000000000'
);


preg_match_all('#((?:^|and|,| |-)*(\b' . implode('\b|\b', array_keys($keys)) . '\b))+#i', $str, $tokens);
//print_r($tokens); exit;
$tokens = $tokens[0];
usort($tokens, 'strlen_sort');

foreach($tokens as $token)
{
    $token = trim(strtolower($token));
    preg_match_all('#(?:(?:and|,| |-)*\b' . implode('\b|\b', array_keys($keys)) . '\b)+#', $token, $words);
    $words = $words[0];
    //print_r($words);
    $num = '0'; $total = 0;
    foreach($words as $word)
    {
     $word = trim($word);
     $val = $keys[$word];
     //echo "$val\n";
     if(bccomp($val, 100) == -1)
     {
      $num = bcadd($num, $val);
      continue;
     }
     else if(bccomp($val, 100) == 0)
     {
      $num = bcmul($num, $val);
      continue;
     }
     $num = bcmul($num, $val);
     $total = bcadd($total, $num);
     $num = '0';
    }
    $total = bcadd($total, $num);
    echo "$total:$token\n";
    $str = preg_replace("#\b$token\b#i", number_format($total), $str);
}
echo "\n$str\n";

?>

joebert 2009-07-03 05:49:01

Found one flaw, it misses common mixtures of numbers and words such as "2 million".

joebert 2009-07-03 06:21:22

It will also mess with certain wordings for dates."I was born in nineteen eighty one"

joebert 2009-07-03 06:27:54

Thank you very much Joebert for the code! I'll try to improve on it. I have set up a test set of 10000 random number words (using the Numbers_Words) and currently, the accuracy of decoding words to numbers is 75%. Correct : forty five thousand five hundred and fifty four becomes 45554 Incorrect: fifty one thousand five hundred and eighty six becomes 586

2009-07-09 01:17:50

Just realized the issue. There is something funny happening while accessing the first key, i.e. 'one' Instead put 'quadrillion' => '1000000000000000' before 'one' and it works with 100% accuracy.

2009-07-10 01:28:15

Also, include 'lakh' => '100000' and 'crore' => '10000000' in $keys. They are more common terms than million in south asian countries

2009-07-10 01:30:50

That makes sense. I have a filesize formatter that works similarly. I must have been in a rush and forgot to put the largest numbers first in the check.

joebert 2009-07-14 15:02:36

ansaurus

tags:

views:

answers:

Converting words to numbers in PHP

related questions