views:

112

answers:

4

I have a string that needs to be validated.

The first two characters must be made up A-G, or Z, but cannot be the following combination: GB or ZZ.

How do I express that in a regular expression?

A: 

Look into negative look-ahead.

Hank Gay
+3  A: 
^([A-F][A-GZ]|G[AC-GZ]|Z[A-G]).*
Drakosha
Most compatible with many regexp flavors... add a `^` to the start to get the 'first two characters' portion of the validation...
gnarf
@gnarf The `.*` at the end implies it's a full-text match
Michael Mrozek
@Michael Mrozek - Sorry, but `^` and `$` are never implied in any Regular Expression languages that I use... The above pattern could match on `ZZ AA aasdfasd` it would just only match the `AA aasdfasd`
gnarf
@gnarf Most regular expression libraries (at least that I've ever seen) have two matching methods, a partial match and a full match. Full match requires that the pattern completely describe the input string, it can't match the substring. If this were a partial match there would've been no need to include `.*` at the end, so it seems logical that he's intending a full match here
Michael Mrozek
Java regexes are anchored at both ends when the `matches()` method is used, Python regexes are anchored at the beginning if you use the `match()` method, and XML Schema regexes are always anchored at both ends. Those are the only cases of implicit anchoring that I'm aware of, so I don't think it's safe to assume the reader is familiar with the concept. If you offer a regex that needs to be anchored, you should use explicit anchors, or explain why they aren't needed if that's the case.
Alan Moore
+8  A: 

Negative lookbehind is the best fit for this.

[A-GZ]{2}(?<!GB)(?<!ZZ)

Explanation:

[A-GZ]{2} matches exactly two characters, both of which must be A-G or Z.
(?<!GB) only matches if the previous two characters matched were not GB.
(?<!ZZ) only matches if the previous two characters matched were not ZZ.

The negative lookbehind, like all lookahead and lookbehind operations, is zero width, meaning it does not change the cursor position. This is why you can string together two in a row as I did. I like this better than |, because it makes it clear the two cases that are not allowed. And doing it twice should have about the same runtime effect as the | operator in a single lookbehind.

jdmichal
And also one of the rarer features of Regular Expression flavors (won't work in JavaScript for instance). Negative lookaheads are more widely supported and equally simple: `^(?!GB)(?!ZZ)[A-GZ]{2}` -- also added the `^` since he specified "the first two characters"
gnarf
Good points gnarf.
jdmichal
What if the first two characters must be evaluated separately?[A-GZ]{1}[A-GXZ]{1}(?<!GB)(?<!ZZ)This doesn't seem to work in .Net?
Ilya Biryukov
My Bad, works great!
Ilya Biryukov
Was about to say; that should work fine. If you're only matching one character, you can omit the {1} for easier reading.
jdmichal
A: 

^([A-F][A-GZ]|G[AC-GZ]|Z[A-G])

Bill Barry