tags:

views:

52

answers:

2

Ok i got this example from Regular Expression Cookbook

^(?=.{3}$).*

The regex above is use to limit the length of an arbitrary pattern

If i test again 'aaabbb', it completely fail

From what i understand it look for any character that precede by any character 3 in length.SO it should match 'bbb' but its not

One more question, should lookbehind follow this pattern x(?=x)

+2  A: 

That is actually a lookahead assertion not a lookbehind assertion. The ^ anchors the match at the start of the string, it then asserts that the beginning of the string must be followed by 3 characters followed by the end of the string.

Edit: I should have probably mentioned that the .* at the end is then used to match those three characters since a lookahead assertion doesn't consume any characters.

Cags
what you mean by not consume any characters?
slier
I mean that after completing the lookahead assertion, the Regex engine continues comparing the string from the point it entered the lookahead. So given the pattern `^(?=foo)(.*)$` and the input `foobar` the value capture by capture group 1 would be `foobar`. The pattern will first perform the look ahead, i.e. check if the start of the string is foo, it will then move onto the `.*` since the lookahead doesn't consume any characters, this means the first letter matched by .* is the f then the o an so forth until it has captured 'foobar'
Cags
+2  A: 

From what i understand it look for any character that precede by any character 3 in length.SO it should match 'bbb' but its not

Nope! Let's take a closer look...

^        # The caret is an anchor which denotes "STARTS WITH"
(?=      # lookahead
   .     # wildcard match; the . matches any non-new-line character
    {3}  # quantifier; exactly 3 times
   $     # dollar sign; I'm not sure if it will act as an anchor but if it did it would mean "THE END"
)        # end of lookbehind
.        # wildcard match; the . matches any non-new-line character
 *       # quantifier; any number of times, including 0 times

Several problems:

  1. The caret requires that the .* be the first characters in the string and then you're trying to lookbehind them for characters sandwhiched between the beginning ^ and the first characters .*.
  2. Your .{3} actually means any three characters, not any character repeated three times ;) You actually want to know How can I find repeated letters with a Perl regex?
LeguRi
@polygenelubricants - Oh fiddlesticks. Thanks...
LeguRi