views:

75

answers:

4

I am fairly new to regular expressions and the more and more I use them, the more I like them. I am working on a regular expression that must meet the following conditions:

  1. Must start with an Alpha character
  2. Out of the next three characters, at least one must be an Alpha character.
  3. Anything after the first four characters is an automatic match.

I currently have the following regex: ^[a-zA-Z](?=.*[a-zA-Z]).{1}.*$

The issue I am running into is that my positive lookahead (?=.*[a-zA-Z]).{1} is not constrained to the next three characters following the alpha character.

I feel as if I am missing a concept here. What am I missing from this expression?

Thanks all.

+1  A: 

You'll probably have to do a workaround. Something like:

^[a-z](?=([a-z]..|.[a-z].|..[a-z])).{3}.*
  • First char [a-z]
  • Positive lookahead, either first, or second, or third char is a-z ([a-z]..|.[a-z].|..[a-z])
  • Other stuff
Jan Jongboom
+2  A: 

The .* in your lookahead is doing that. You should limit the range here like

^[a-zA-Z](?=.{0,2}[a-zA-Z]).{1}.*$

Edit: If you want to make sure, that there are a least 4 characters in the string, you could use another lookahead like this:

 ^[a-zA-Z](?=.{3})(?=.{0,2}[a-zA-Z]).{1}.*$
Jens
This will also match `aa`
Jan Jongboom
@Jan: The OP does not mention that he wants at least four characters, but I updated my answer to give this option.
Jens
@Jens: How is `.{1}` different from `.`? ;-) (For the record, yor second expression is easier written as `^[a-zA-Z](?=.{0,2}[a-zA-Z]).{3}.*`. +1 from me, elegant approach.)
Tomalak
@Tomalak: Its not different. I just did not want to alter the OP's expression too much, so that he can actually see what I changed to avoid his problem.
Jens
+3  A: 

What do you want lookahead for? Why not just use

^[a-zA-Z](..[a-zA-Z]|.[a-zA-Z].|[a-zA-Z]..)

and be happy?

Kilian Foth
Good point. Using lookahead and (especially) lookbehind when not necessary can hurt performance. +1 for the simplification.
Robusto
Perfectly reasonable solution. I suppose I was overthinking things. Just for the fun of it, what if I said at least one of the next 30 characters must be alpha. Is there a way to encapsulate this idea into a simpler statement?
@Robusto: Not in this case, at least not beyond "academic" differences.
Tomalak
@user90279: Yes, by using look-ahead. ;-)
Tomalak
@user90279: It is MUCH better in that case to just use ^.[^a-zA-Z]{1,30}$ instead and the negate the match operator. Regexes don't handle huge alternatives well.
Kilian Foth
So when a regex evaluates a OR statement, does it have the ability to short-circuit on a match, or will it evaluate all other statements?
A: 

Change the * in your lookahead to ? to get m/^[a-zA-Z](?=.?[a-zA-Z]).{1}.*$

If I am understanding your criteria, that fixes it because of the change in greediness.

These are correctly matched:

a2a3-match
2aaa-no match
Aaaa-match
a333-no match
drewk
Will also match `aa`
Jan Jongboom