views:

735

answers:

3

I want to split an arithmetic expression into tokens, to convert it into RPN.

Java has the StringTokenizer, which can optionally keep the delimiters. That way, I could use the operators as delimiters. Unfortunately, I need to do this in PHP, which has strtok, but that throws away the delimiters, so I need to brew something myself.

This sounds like a classic textbook example for Compiler Design 101, but I'm afraid I'm lacking some formal education here. Is there a standard algorithm you can point me to?

My other options are to read up on Lexical Analysis or to roll up something quick and dirty with the available string functions.

+1  A: 

This might help.

Practical Uses of Tokenizer

Shoan
The tokenizer is nice, but some aspects of it got in the way: a) you have to wrap the string into "<?" and "?>" b) you have to reformat the array of tokens and de-/recode the token types, if you don't happen to use the same format as PHP does.
Hanno Fietz
A: 

As often, I would just use a regular expression to do this:

$expr = '(5*(7 + 2 * -9.3) - 8 )/ 11';
$tokens = preg_split('/([*\/^+-]+)\s*|([\d.]+)\s*/', $expr, -1,
        PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$tts = print_r($tokens, true);
echo "<pre>x=$tts</pre>";

It needs a little more work to accept numbers with exponent (like -9.2e-8).

PhiLho
Hey, thanks! I had overlooked PREG_SPLIT_DELIM_CAPTURE, now it's easy.
Hanno Fietz
A: 

OK, thanks to PhiLho, my final code is this, should anyone need it. It's not even really dirty. :-)

static function rgTokenize($s)
{
    $rg = array();

    // remove whitespace
    $s = preg_replace("/\s+/", '', $s);

    // split at numbers, identifiers, function names and operators
    $rg = preg_split('/([*\/^+\(\)-])|(#\d+)|([\d.]+)|(\w+)/', $s, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

    // find right-associative '-' and put it as a sign onto the following number
    for ($ix = 0, $ixMax = count($rg); $ix < $ixMax; $ix++) {
        if ('-' == $rg[$ix]) {
            if (isset($rg[$ix - 1]) && self::fIsOperand($rg[$ix - 1])) {
                continue;
            } else if (isset($rg[$ix + 1]) && self::fIsOperand($rg[$ix + 1])) {
                $rg[$ix + 1] = $rg[$ix].$rg[$ix + 1];
                unset($rg[$ix]);
            } else {
                throw new Exception("Syntax error: Found right-associative '-' without operand");
            }
        }
    }
    $rg = array_values($rg);

    echo join(" ", $rg)."\n";

    return $rg;
}
Hanno Fietz