views:

115

answers:

3

I'm trying to run a regular expression in VBA code that uses Microsoft VBScript Regular Expressions 5.5 (should be the same as JavaScript regex)

regex: ^[0-9A-Z]?[0-9A-Z]{3}[A-Z]?([0-9A-Z]{6})-?([0-9])?$
input: X123A1234567
match: 123456

The six characters I'm interested in give a good match of 123456, ignoring the last (check) digit. Perfect. (The check digit is captured, but it's not a major concern to me).

But when BOTH the optional portions are gone (they are optional) the match grabs the last digit

GOOD: input: 123123456 match: 123456

No alphas, no check digit. Good match.

GOOD input: 123A1234567
match: 123456

Leave in the optional middle alpha, take out the optional leading alpha, leave in check digit, and we still get the good match of 123456

GOOD input: X1231234567
match: 123456

Leave in the optional leading alpha, take out the middle optional alpha, leave in check digit, and we still get a good match of 123456

BAD input: 1231234567
match: 234567

Take out BOTH optional alphas, leave in check digit, and we get a bad match of 234567

Have a looksee @ the regex testers on http://www.regular-expressions.info/javascriptexample.html or http://www.regular-expressions.info/vbscriptexample.html

What am I missing, here? How can I get the regex to ignore the last digit when both optional alphas are missing? The regex is used to feed a lookup system, so that no matter what format the input data, we can match to a complete value.

UPDATE: None of the above examples includes the hyphen (shown in regex). input data with the hyphen and check digit has always matched.

UPDATE: working regex, thanks to the below suggestions (thanks!):

regex: ^[A-Z]?[0-9]{3}[A-Z]?([0-9]{6})-?([0-9])?$

+1  A: 

If you really don't want the last digit, don't make it optional - take out that last ? before the $

Dan
It has to be optional, as input data could be 123123456 - which matches just fine, yeilding "123456"; I'll update the question to make this explicit.In the above example, I don't want the last digit -- "123456" is the target; "7" is a check digit.
Michael Paulukonis
I upvoted this response because, even though it didn't solve the problem, it made me clarify my question [and look at the regex elements, again]
Michael Paulukonis
A: 

Your regex is really overly complex. You don't need to bother matching anything at the beginning if you use greedy matching. All you need is:

([0-9A-Z]{6})\d$

I'm also not sure if you need the -?. None of your input data indicates it. (but you could add it)

Also a faster way to do this would be VB6 equivalent of substr if the input data is always the same length.

Cfreak
In the original examples input data is not always the same length (all alphas present, some alphas missing, all alphas missing). Additionally, the final check digit is not always present (there was no original example with it missing; I added it after you posted). So substr is not a good candidate.
Michael Paulukonis
+1  A: 

If you take out the optional leading alpha, the 1 matches the first character class [0-9A-Z]? and has no reason to relinquish it because the entire regex matches - after all the last digit is optional in your regex.

Since it doesn't appear to be optional (you just don't want to match it) drop the trailing ?, and the regex should work.

Or make the first part of the regex [A-Z]? so it will never match a number - if that fits in your rules.

Tim Pietzcker
I think this is it. I'll have to verify if the initial optional character is just alpha, or alphanumeric.
Michael Paulukonis
This was it. Specs were a bit hazy, but I realized that they use "character" for alpha-only, and "numeric" for numeric-only. numeric makes sense, but I was thinking charcter as alpha-numeric.
Michael Paulukonis