tags:

views:

46

answers:

3

I need to write a regular expression for an 'option symbol' for my company so we can validate these symbols across our site.

An option symbol is composed of two parts:

Part1         Part2
_ _ _ _ _ _ | _ _ _ _ _ _ _ _ _ 

I can write a regular expression for Part2 as it is fairly simple.

However, Part1 (the first 6 character positions) can be a little complicated.

It boils down to:

  • Part1 must be {6} characters in length total.
  • There must be between {1,4} alpha characters in the first positions.
  • After that, there can optionally be {1} numeric character.
  • Finally, the rest of the characters remaining must be spaces, so Part1 totals to 6 characters.

The problem I'm having is that the number of spaces is variable based on the number of characters before it. This makes me think it isn't easily representable by a regular language.

How can I avoid brute-forcing it like so:

([A-Za-z]{1}[0-9]{1}[ ]{4}|
[A-Za-z]{2}[0-9]{1}[ ]{3}|
[A-Za-z]{3}[0-9]{1}[ ]{2}|
[A-Za-z]{4}[0-9]{1}[ ]{1}|
[A-Za-z]{1}[ ]{5}|
[A-Za-z]{2}[ ]{4}|
[A-Za-z]{3}[ ]{3}|
[A-Za-z]{4}[ ]{2}|
[A-Za-z]{5}[ ]{1})

Here are some example option symbols (remember, ignore everything beyond the first 6 characters):

F     123456P12345678
CMG   123456P12345678
AAPL  123456P12345678
GOOG1 123456C12345678
F5    123456C12345678
+6  A: 

You can use a lookbehind assertion:

^[A-Za-z]{1,4}\d? +\b(?<=^.{6})

Explanation:

^: Match the start of the line or string (depending on whether you set the option RegexOptions.Multiline or not).

[A-Za-z]{1,4}\d? +: Match 1 to 4 alpha characters and an optional digit, followed by at least one space

\b: Assert that we are now at a word boundary (i.e., that the next character is alphanumeric)

(?<=^.{6}): Assert that the above match is exactly six characters long.

Tim Pietzcker
I would need to change this to handle the optional single numeric digit, right?
FrankTheTank
Also, I don't know who downvoted you. I would upvote you if I had the ability to.
FrankTheTank
My first version had a bug and would allow false positives. Must have been downvoted while I was correcting it :)
Tim Pietzcker
Would this modification account for the numeric?: ^[A-Za-z]{1,4}[0-9]{1}? +\b(?<=^.{6})
FrankTheTank
+1 Very nice, will keep the lookbehind trick for these kinds of problems in mind
Wrikken
Changed down to upvote. I was going to explain why but got distracted by a coworker. (Real work, psht)
treefrog
A: 

Try the following:

[A-Za-z]{1,4}[0-9]{1}?\s*?

The {1,4} allows a variable number of alphanumeric, the ? allows for it to be optional, and the * is similar to {0,}.

eykanal
Would this not also match any number of spaces? The number of spaces must be based upon the number of remaining characters in the 6 character space.
FrankTheTank
Why use [0-9]{1}? instead of just [0-9]?
@FrankTheTank - unless there's a specific reason why the regular expression needs to do that, I would have some other part of the code verify that the entire expression contains a legal amount of characters, and use the regular expression for what it's good at; character matching.@fy-tide - yeah, actually, [0-9] is the same as [0-9]{1}. Don't know why I wrote that.
eykanal
A: 

do it as a 2 part check, the first part is the regular expression, the second part is the length of the match against the regular expression.

ie something like this for the regular expression:

[:alpha:]{1,4}[:digit:]?[ ]{1,5}

The check against the length of the matched expression will ensure that it is valid, since the regular expression will not match if an individual section is invalid. So, if each section (alpha, digit, and space) is valid and the length is 6 then you have a valid match.

diverscuba23
I'm restricted to using regular expressions only. We are using this expression in validator controls on our front-end, etc.
FrankTheTank
yeah, that would eliminate my solution. Tim Pietzker's solution will work for you though.
diverscuba23