tags:

views:

204

answers:

5

I understand regular expressions reasonably well, but I don't get to make use of them often enough to be an expert. I ran across a regular expression that I am using to validate password strength, but it contains some regex concepts I am unfamiliar with. The regex is:

^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$

and in plain English it means that the string must contain at least one lowercase character, one uppercase character, and one number, and the string must be at least six characters long. Can anyone break this down for me to explain how this pattern actually describes that rule? I see a start of string char ^ and an end of string char $, three groups with lookaheads, a match any character . and a repetition {6,}.

Thanks to any regex guru who can help me get my head around this.

A: 

The look-ahead assertions are used to ensure that there are at least one digit, one lowercase and one uppercase letter in the string.

Gumbo
+8  A: 

Under normal circumstances, a piece of a regular expression matches a piece of the input string, and "consumes" that piece of the string. The next piece of the expression matches the next piece of the string, and so on.

Lookahead assertions don't consume any of the string, so your three lookahead assertions:

  • (?=.*\d)
  • (?=.*[a-z])
  • (?=.*[A-Z])

each mean "This pattern (anything followed by a digit, a lowercase letter, an uppercase letter, respectively) must appear somewhere in the string", but they don't move the current match position forwards, so the remainder of the expression:

  • .{6,}

(which means "six or more characters") must still match the whole of the input string.

RichieHindle
+5  A: 

The lookahead group doesn't consume the input. This way, the same characters are actually being matched by the different lookahead groups.

You can think of it this way: search for anything (.*) until you find a digit (\d). If you do, go back to the beginning of this group (the concept of lookahead). Now look for anything (.*) until you find a lower case letter. Repeat for upper case letter. Now, match any 6 or more characters.

Sinan Taifour
+3  A: 

To break it down completely.

^ -- Match beginning of line
(?=.*\d) -- The following string contains a number
(?=.*[a-z]) -- The following string contains a lowercase letter
(?=.*[A-Z]) -- The following string contains an uppercase letter
.{6,} -- Match at least 6, as many as desired of any character
$ -- Match end of line
Sean Vieira
+1  A: 

I went and checked to see how this would match if using Perl:

perl -Mre=debug -E'q[  abc  345 DEF  ]=~/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$/'

Compiling REx "^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$"
synthetic stclass "ANYOF[\0-\11\13-\377{unicode_all}]".
Final program:
   1: BOL (2)
   2: IFMATCH[0] (9)
   4:   STAR (6)
   5:     REG_ANY (0)
   6:   DIGIT (7)
   7:   SUCCEED (0)
   8: TAIL (9)
   9: IFMATCH[0] (26)
  11:   STAR (13)
  12:     REG_ANY (0)
  13:   ANYOF[a-z] (24)
  24:   SUCCEED (0)
  25: TAIL (26)
  26: IFMATCH[0] (43)
  28:   STAR (30)
  29:     REG_ANY (0)
  30:   ANYOF[A-Z] (41)
  41:   SUCCEED (0)
  42: TAIL (43)
  43: CURLY {6,32767} (46)
  45:   REG_ANY (0)
  46: EOL (47)
  47: END (0)

floating ""$ at 6..2147483647 (checking floating) stclass ANYOF[\0-\11\13-\377{unicode_all}] anchored(BOL) minlen 6 
Guessing start of match in sv for REx "^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$" against "  abc  345 DEF  "
Found floating substr ""$ at offset 16...
start_shift: 6 check_at: 16 s: 0 endpos: 11
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$" against "  abc  345 DEF  "
   0 <> <  abc  345>         |  1:BOL(2)
   0 <> <  abc  345>         |  2:IFMATCH[0](9)
   0 <> <  abc  345>         |  4:  STAR(6)
                                    REG_ANY can match 16 times out of 2147483647...
  16 <c  345 DEF  > <>       |  6:    DIGIT(7) # failed...
  15 <c  345 DEF > < >       |  6:    DIGIT(7) # failed...
  14 <c  345 DEF> <  >       |  6:    DIGIT(7) # failed...
  13 <c  345 DE> <F  >       |  6:    DIGIT(7) # failed...
  12 <c  345 D> <EF  >       |  6:    DIGIT(7) # failed...
  11 <c  345 > <DEF  >       |  6:    DIGIT(7) # failed...
  10 <c  345> < DEF  >       |  6:    DIGIT(7) # failed...
   9 <c  34> <5 DEF  >       |  6:    DIGIT(7)
  10 <c  345> < DEF  >       |  7:    SUCCEED(0)
                                      subpattern success...
   0 <> <  abc  345>         |  9:IFMATCH[0](26)
   0 <> <  abc  345>         | 11:  STAR(13)
                                    REG_ANY can match 16 times out of 2147483647...
  16 <c  345 DEF  > <>       | 13:    ANYOF[a-z](24) # failed...
  15 <c  345 DEF > < >       | 13:    ANYOF[a-z](24) # failed...
  14 <c  345 DEF> <  >       | 13:    ANYOF[a-z](24) # failed...
  13 <c  345 DE> <F  >       | 13:    ANYOF[a-z](24) # failed...
  12 <c  345 D> <EF  >       | 13:    ANYOF[a-z](24) # failed...
  11 <c  345 > <DEF  >       | 13:    ANYOF[a-z](24) # failed...
  10 <c  345> < DEF  >       | 13:    ANYOF[a-z](24) # failed...
   9 <c  34> <5 DEF  >       | 13:    ANYOF[a-z](24) # failed...
   8 <bc  3> <45 DEF  >      | 13:    ANYOF[a-z](24) # failed...
   7 <abc  > <345 DEF  >     | 13:    ANYOF[a-z](24) # failed...
   6 < abc > < 345 DEF  >    | 13:    ANYOF[a-z](24) # failed...
   5 <  abc> <  345 DEF >    | 13:    ANYOF[a-z](24) # failed...
   4 <  ab> <c  345 DEF>     | 13:    ANYOF[a-z](24)
   5 <  abc> <  345 DEF >    | 24:    SUCCEED(0)
                                      subpattern success...
   0 <> <  abc  345>         | 26:IFMATCH[0](43)
   0 <> <  abc  345>         | 28:  STAR(30)
                                    REG_ANY can match 16 times out of 2147483647...
  16 <c  345 DEF  > <>       | 30:    ANYOF[A-Z](41) # failed...
  15 <c  345 DEF > < >       | 30:    ANYOF[A-Z](41) # failed...
  14 <c  345 DEF> <  >       | 30:    ANYOF[A-Z](41) # failed...
  13 <c  345 DE> <F  >       | 30:    ANYOF[A-Z](41)
  14 <c  345 DEF> <  >       | 41:    SUCCEED(0)
                                      subpattern success...
   0 <> <  abc  345>         | 43:CURLY {6,32767}(46)
                                  REG_ANY can match 16 times out of 2147483647...
  16 <c  345 DEF  > <>       | 46:  EOL(47)
  16 <c  345 DEF  > <>       | 47:  END(0)
Match successful!
Freeing REx: "^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$"

I slightly modified the output

Brad Gilbert
Thanks, Brad. You really DID break this down for me. :-) Very interesting.
Rich