views:

80

answers:

3

I am writing the regex for validating password in Javascript. The constraints are:

  1. Password must contain at least one uppercase character
  2. Password must contain at least a special character

With trial and error and some searching on the net, I found that this works:

/(?=.*[A-Z]+)(?=.*[!@#\$%]+)/

Can someone please explain the part of this expression which mentions that the uppercase letter and special character can come in ANY order?

+1  A: 

the "?=" does this. It is a "Positive Lookahead"

From JavaScript Regular Expression Syntax

Positive lookahead matches the search string at any point where a string matching pattern begins. This is a non-capturing match, that is, the match is not captured for possible later use. For example 'Windows (?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not "Windows" in "Windows 3.1". Lookaheads do not consume characters, that is, after a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the lookahead.

Ryan Conrad
Thanks. Silentghost's response has the details.
atlantis
That link is very useful ... thanks again!
atlantis
+1  A: 

The ?= is called a lookahead where it will scan the rest of the string to see if the match is found. Normally, regex go character by character, but the ?= tells it to "lookahead" to see if it exists.

There is also a negative lookahead of ?!.

Aaron Harun
Thanks. Silentghost's response has the details.
atlantis
+1  A: 

I think this would work even better:

/(?=.*[A-Z])(?=.*[!@#\$%])/

Look-arounds do not consume characters, therefore, start for the second look-ahead is the same as for the first. Which makes checks for those two characters independent of each other. You could swap them around and resulting regex would still be equivalent to this.

The following regex (suggested by Gumbo) is slightly more efficient, as it avoids unnecessary backtracking:

/(?=[^A-Z]*[A-Z])(?=[^!@#\$%]*[!@#\$%])/

On passwords of usual lengths the time difference probably won't be easily measurable, though.

SilentGhost
I think `+` is one of the desired 'special characters'. Instead of removing it entirely, it should be in the character class `[]`
LeguRi
@Richard: plus is also used in the first look-ahead, therefore, I think it's used as a quantifier. It's not entirely wrong, it's just redundant.
SilentGhost
@SilentGhost - This is true; I didn't notice it in the first.
LeguRi
Make it a little smarter: `/(?=[^A-Z]*[A-Z])(?=[^!@#\$%]*[!@#\$%])/`. That avoids unnecessary backtracking.
Gumbo
why the downvote?
SilentGhost
Thanks @SilentGhost for the detailed explanation!@Richard: I am using the + as a quantifier as pointed out above. Now I understand why it is redundant.I did not understand the part about 'unnecessary backtracking' though :( A small explanation would be great!
atlantis
@atlantis: it has to do with internals of regex engine: `*` is a greedy quantifier, which means that it tries to match corresponding character or character class as many times as possible, which means it tries to match the whole subject string, then checks if the rest of the regex `[A-Z]` could be matched, if not it "back-tracks", i.e., releases a character to be matched with `[A-Z]` and reduces match with `.`, it does so until it matches or fails to match and returns. What Gumbo proposes is trying a simple match for all character except `A-Z`, which is a forward match character by character.
SilentGhost
@atlantis: here probably is a better explanation: http://www.regular-expressions.info/repeat.html
SilentGhost
@SilentGhost: Thanks a ton for the pointers. Learnt quite a few things today! Cheers.@Gumbo: Thanks for the smartening up of the regex!
atlantis