tags:

views:

190

answers:

2

I’m using a commercial application that has an option to use RegEx to validate field formatting. Normally this works quite well. However, today I’m faced with validating the following strings: quoted alphanumeric codes with simple arithmetic operators (+-/*). Apparently the issue is sometimes users add additional spaces (e.g. “ FLR01” instead of “FLR01”) or have other typos such as mismatched parenthesis that cause issues with downstream processing.

The first examples all had 5 codes being added:

"FLR01"+"FLR02"+"FLR03"+"FMD01"+"FMR05"

So I started going down the road of matching 5 alphanumeric characters quoted by strings:

"[0-9a-zA-Z]{5}"[+-*/]

However, the formulas quickly got harder and I don’t know how to get around the following complications:

  1. I need to test for one of the four simple math operators (+-*/) between each code, but not after the last one.
  2. There can be any number of codes being added together, not just five as in the example above.
  3. Enclosed parenthesis are okay (“X”+”Y”)/”2”
  4. Mismatched parenthesis are not okay.
  5. No formula (e.g. a blank) is okay.

Valid:

"FLR01"+"FLR02"+"FLR03"+"FMD01"+"FMR05"
"0XT"+"1SEAL"+"1XT"+"23LSL"+"23NBL"  
("LS400"+"LT400")*"LC430"/("EL414"+"EL414R"+"LC407"+"LC407R"+"LC410"+"LC410R"+"LC420"+"LC420R")

Invalid:

" FLR01" +"FLR02"
"FLR01"J"FLR02"
("FLR01"+"FLR02"

Is this not something you can easily do with RegExp? Based on Jeff’s answer to 230517, I suspect I’m failing at least the ‘matched pairing’ issue. Even a partial solution to the problem (e.g. flagging extra spaces, invalid operators) would likely be better than nothing, even if I can't solve the parenthesis issue. Suggestions welcomed!

Thanks,

Stephen

+2  A: 

As you are aware you can't check for matching parentheses with regular expressions. You need something more powerful since regexes have no way of remembering state and counting the nested parentheses.

This is a simple enough syntax that you could hand code a simple parser which counts the parentheses, incrementing and decrementing a counter as it goes. You'd simply have to make sure the counter never goes negative.

As for the rest, how about this?

("[0-9a-zA-Z]+"([+\-*/]"[0-9a-zA-Z]+")*)?

You could also use this regular expression to check the parentheses. It wouldn't verify that they're nested properly but it would verify that the open and close parentheses show up in the right places. Add in the counter described above and you'd have a proper validator.

(\(*"[0-9a-zA-Z]+"\)*([+\-*/]\(*"[0-9a-zA-Z]+"\)*)*)?
John Kugelman
John, thanks for your answer, but so far I can't get your regex's to match. [0-9a-zA-Z] appears to just match one character in the quotes--I need it to be any size one or larger. [+\-*/] works, but I can't get it to match the last code in the formula. I tried debugging in RegexBuddy, but haven't cracked it yet.
Stephen Pace
Oops, I didn't check the regex you started with! I added a `+` which will match one-or-more alphanumeric characters.
John Kugelman
+1  A: 

You can easily use regex's to match your tokens (numbers, operators, etc), but you cannot match balanced parenthesis. This isn't too big of a problem though, as you just need to create a state machine that operates on the tokens you match. If you're not familiar with these, think of it as a flow chart within your program where you keep track of where you are, and where you can go. You can also have a look at the Wikipedia page.

Dana the Sane
Unfortunately, I can't easily get the regex to work for my tokens. If it is easy for you, please share. :-) I can't go for a state machine solution since this regex is an attribute validator in a commercial application.
Stephen Pace
Isn't there a mechanism to set your own validator? Other form validators I've worked with have this feature.
Dana the Sane
Sort of. Core validations happen first and most efficiently (size not larger than, references to other attributes, etc.) and regex is one of the core. Outside of the core validations, there are validations by script and program. Those validations are invoked after core validations and are applied to entire categories of data, not just the attribute you are interested in. Of course, you could target just the attribute you cared about by name, but there can be performance implications for these additional validations, plus there is the cost of maintaining them in general.
Stephen Pace