views:

124

answers:

1

I'm trying out preg_match with a roman numeral to integer converter. The problem is, for certain inputs, preg_replace seems to be giving too few matches. The code:

function romanNumeralToInt($romanNumeral)
{   preg_match
    (   '/^(M?M?M?)'
        .'((CM)|(CD)|((D?)(C?C?C?)))'
        .'((XC)|(XL)|((L?)(X?X?X?)))'
        .'((IX)|(IV)|((V?)(I?I?I?)))$/', $romanNumeral, $match);
    print_r($match);

    $result=0;
    $result += 1000*strlen($match[1]);
    if(strlen($match[3]) != 0){$result += 900;}
    if(strlen($match[4]) != 0){$result += 400;}
    if(strlen($match[5]) != 0)
    {   $result += 100*strlen($match[7]) + 500*strlen($match[6]);
    }
    if(strlen($match[9]) != 0){$result += 90;}
    if(strlen($match[10]) != 0){$result += 40;}
    if(strlen($match[11]) != 0)
    {   $result += 10*strlen($match[13]) + 50*strlen($match[12]);
    }
    if(strlen($match[15]) != 0){$result += 9;}
    if(strlen($match[16]) != 0){$result += 4;}
    if(strlen($match[17]) != 0)
    {   $result += 1*strlen($match[19]) + 5*strlen($match[18]);
    }

    return $result;
}

echo romanNumeralToInt("XXVIII"); // gives correct results

But any roman numeral ending in "IV" will cut off the last 3 matches ($matches will only contain elements 0-16 rather than the full 0-19), and similarly any roman numeral ending in "IX" will cut off the last 4 matches.

Is this expected behavior, or is my PHP buggy?

+1  A: 

I expect this to be expected behavior. =)

Regex tries to match OR groups from left to right, stopping as soon as it finds a match, so it will never try to match those last three (or four) groups if it finds a IV or IX.

Actually, I think that, if your expression contains a CM or XL or something like that, some of the other entries will be missing, too.

I find that using RegExr helps a lot with debugging regular expressions. Using this for your regex, some groups catch empty strings, and some groups contain NO MATCH.

Jens
Tested CM and XL too, doesn't happen for them (I had that thought as well). +1 for RegExr, thats a nice tool. Its a little bizarre that it would just ignore the last couple match-groups.
B T