tags:

views:

40

answers:

1

I have the following C# code which should match a quantity / $ price string like "4/$3.99". It works all day long until we use it against a string returned from Firefox Browser. 77.77 becomes 77 (dropping the .77 cents).

var matches = Regex.Match(_priceText, 
    @"^\s?((?<qty>\d+)\s?/)?\s?[$]?\s?(?<price>[0-9]?\.?[0-9]?[0-9]?)");

if( matches.Success)
{
    if (!Decimal.TryParse(matches.Groups["price"].Value, out this._price))
        this._price = 0.0m;
    if (!Int32.TryParse(matches.Groups["qty"].Value, out this._qty))
        this._qty = (this._price > 0 ? 1 : 0);
    else
        if (this._price > 0 && this._qty == 0)
            this._qty = 1;
}

Any idea why the period wouldn't match coming from a Firefox string, but the C# string matches? There isn't any special about the Firefox we used. It's a plain Jane 1252 code page download right off the Firefox site. The computer's local settings are unaltered North American, etc. We have two different computers showing the same effects. It's Firefox 3.6.4, nothing fancy or beta.

+4  A: 

Firefox isn't the problem. The pattern is incomplete.

Try this pattern instead:

@"^\s?((?<qty>\d+)\s?/)?\s?[$]?\s?(?<price>[0-9]{1,2}\.?[0-9]?[0-9]?)"

The problem in the original pattern is the (?<price>[0-9]?\.?[0-9]?[0-9]?) portion. The problem you described occurs with any number that starts with 2 digits, not just Firefox values. Your sample was 4/$3.99 but 4/$33.99 would cause the same issue. The [0-9]?\.?[0-9]?[0-9]? part matches a digit followed by a period. Unfortunately the pattern is littered with optional ? metacharacters after almost everything and that is why this bug has popped up. For 77.77 it matches the first 7 then it should match a dot but wait, there's a second 7 and no dot (which is optional \.?) so it happily skips it. Next the pattern expects 2 optional digits, but it sees a dot and stops, thus returning only 77. That's the general idea.

Having said that, you should lay out precisely what inputs are valid when constructing the pattern. Your original pattern indicates that the price group is entirely optional. Look at it closely; everything has a ? appended to it. So what are your goals? Is it optional? Are whole numbers allowed? Must it be a decimal with a .xy in the number? My proposed pattern at the top used [0-9]{1,2} to force 1-2 numbers to exist, while leaving the .xy portion optional.

If the .xy portion is truly optional you could update your price group to this: (?<price>\d{1,2}(?:\.\d{1,2})?) - that way the optional ? metacharacter applies to everything that is optional and is only specified once. This makes the pattern more readable IMO. The (?:...) portion is optional (specifically usage of ?: not the actual grouping) but it's good practice to avoid capturing the group unnecessarily. With these changes in place the new pattern would be:

@"^\s?((?<qty>\d+)\s?/)?\s?[$]?\s?(?<price>\d{1,2}(?:\.\d{1,2})?)"

Notice that the pattern still has issues, depending on what your requirements are. The entire qty group is optional, meaning the 4/ part could be omitted from the input and an input of $3.99 would be valid. If this is required then don't make it optional:

@"^\s?((?<qty>\d+)\s?/)\s?[$]?\s?(?<price>\d{1,2}(?:\.\d{1,2})?)"
Ahmad Mageed
Nice. Not sure how that missed the tests. We are really trying to match (11)/$(22.33) where the first 11/ is optional, the $ is optional, the 22 is optional and the .33 is optional. The field will only have those characters from beginning to end.
Dr. Zim
@Dr. Zim: in that case I suppose you can use `{0,2}` for the `22` portion before the dot: `(?<price>\d{0,2}(?:\.\d{1,2})?)`. Beyond that it looks like your if/else logic will set default values.
Ahmad Mageed
That worked very well. Thank you.
Dr. Zim